Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bedtools coverage is inaccurate

    I am using bedtools to calculate coverage in an RNA-seq bam file. I get screwy readings when it encounters a gap where there are no reads. Rather than filling it in with zeros, it takes a value that is somewhere near the last read and uses that value to fill across the gap. For example, I just looked in IGV at a suspicious area in a bam file. IGV shows readcounts of 11, 11, 10, 10, 10, 9, 9, 9, 9, 1 and then a bunch of zeros. bedtools, on the other hand, lists the same numbers except that when it comes to the final 1, it says 7, and then the very long list of zeros is instead a very long list of 7s. Can anyone suggest a solution? The code I used is pasted below. (The last little thing where I pipe results to grep is to output only lines with non-zero read counts. Perhaps that's where I'm screwing up, but I don't think so.)

    Thanks.

    Eric



    genomeCoverageBed -d -split -ibam input.bam -g /home/efoss/gene_databases/dot_genome_files/g1k_v37.genome | grep -v -P "\S+\t\S+\t0" > output.txt
    Last edited by efoss; 09-19-2013, 11:48 AM.

  • #2
    Hey Eric,

    My guess would be that if a read is split (such as spanning a splice junction), then that read may count towards the coverage for the entire interval, even the skipped bases. This link seems to talk about a similar issue. Looks like the solution in the post is to split those reads and try coverage again.

    (Edit: although looks like you did use the -split argument so maybe it is a different issue).

    Justin
    Last edited by BAMseek; 09-19-2013, 10:16 PM.

    Comment


    • #3
      Hi!

      Make sure the BAM file is sorted by coordinate and the splitted alignment is defined by N cigar code (true for tophat by default).

      Also you might check some alternative ways to calculate coverage, for example using coverageBed and intersection with annotation file:
      This post is actually inspired by the discussion on the Qualimap google groups . Assume we need to calculate mean coverage inside of the ...

      Comment


      • #4
        Hi kokonech,

        Thanks for the suggestion. I finally figured it out - there was a version of bedtools that had this as a known bug. I updated to the latest version and the problem went away.

        I bookmarked your site. It looks very helpful.

        Best wishes,

        Eric

        Comment


        • #5
          Hi efoss,

          can you tell us which version of bedtools you were using before?

          thanks

          Comment


          • #6
            Hi SylvainL,

            I'm afraid I don't know which version it was. I think it was installed on our system in 2011. The one I'm using now is 2.17.0.

            Eric

            Comment


            • #7
              Hey Guys,

              I am getting this weird error for my hg19.fa genome file.

              this is my command line argument:
              genomeCoverageBed -d -i Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.bed -g /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa 1> Sample_23430_B14_T.sorted.dedup.rg.resorted.realigned.coverage

              and I get this error:
              Less than the req'd two fields were encountered in the genome file (/pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/bwa/genome/hg19.fa) at line 1. Exiting.

              anyone know why, I can't seem to figure it out.

              Thanks!,
              Nino

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                Today, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 07:17 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-02-2024, 08:06 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-30-2024, 12:17 PM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-29-2024, 10:49 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Working...
              X