Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bedtools coverage is inaccurate

    I am using bedtools to calculate coverage in an RNA-seq bam file. I get screwy readings when it encounters a gap where there are no reads. Rather than filling it in with zeros, it takes a value that is somewhere near the last read and uses that value to fill across the gap. For example, I just looked in IGV at a suspicious area in a bam file. IGV shows readcounts of 11, 11, 10, 10, 10, 9, 9, 9, 9, 1 and then a bunch of zeros. bedtools, on the other hand, lists the same numbers except that when it comes to the final 1, it says 7, and then the very long list of zeros is instead a very long list of 7s. Can anyone suggest a solution? The code I used is pasted below. (The last little thing where I pipe results to grep is to output only lines with non-zero read counts. Perhaps that's where I'm screwing up, but I don't think so.)

    Thanks.

    Eric



    genomeCoverageBed -d -split -ibam input.bam -g /home/efoss/gene_databases/dot_genome_files/g1k_v37.genome | grep -v -P "\S+\t\S+\t0" > output.txt
    Last edited by efoss; 09-19-2013, 11:48 AM.

  • #2
    Hey Eric,

    My guess would be that if a read is split (such as spanning a splice junction), then that read may count towards the coverage for the entire interval, even the skipped bases. This link seems to talk about a similar issue. Looks like the solution in the post is to split those reads and try coverage again.

    (Edit: although looks like you did use the -split argument so maybe it is a different issue).

    Justin
    Last edited by BAMseek; 09-19-2013, 10:16 PM.

    Comment


    • #3
      Hi!

      Make sure the BAM file is sorted by coordinate and the splitted alignment is defined by N cigar code (true for tophat by default).

      Also you might check some alternative ways to calculate coverage, for example using coverageBed and intersection with annotation file:
      This post is actually inspired by the discussion on the Qualimap google groups . Assume we need to calculate mean coverage inside of the ...

      Comment


      • #4
        Hi kokonech,

        Thanks for the suggestion. I finally figured it out - there was a version of bedtools that had this as a known bug. I updated to the latest version and the problem went away.

        I bookmarked your site. It looks very helpful.

        Best wishes,

        Eric

        Comment


        • #5
          Hi efoss,

          can you tell us which version of bedtools you were using before?

          thanks

          Comment


          • #6
            Hi SylvainL,

            I'm afraid I don't know which version it was. I think it was installed on our system in 2011. The one I'm using now is 2.17.0.

            Eric

            Comment


            • #7
              Hey Guys,

              I am getting this weird error for my hg19.fa genome file.

              this is my command line argument:
              genomeCoverageBed -d -i Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.bed -g /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa 1> Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.coverage

              and I get this error:
              Less than the req'd two fields were encountered in the genome file (/pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa) at line 1. Exiting.

              anyone know why, I can't seem to figure it out.

              Thanks!,
              Nino

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Addressing Off-Target Effects in CRISPR Technologies
                by seqadmin






                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                08-27-2024, 04:44 AM
              • seqadmin
                Selecting and Optimizing mRNA Library Preparations
                by seqadmin



                Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                08-07-2024, 12:11 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 08-27-2024, 04:40 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 08-22-2024, 05:00 AM
              0 responses
              293 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 08-21-2024, 10:49 AM
              0 responses
              135 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 08-19-2024, 05:12 AM
              0 responses
              124 views
              0 likes
              Last Post seqadmin  
              Working...
              X