Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • a basic question about coverage

    Hi everybody,
    I have a basic question in NGS area
    how can we calculate sequencing coverage (5X, 20X ...) at selected regions of interest? and what does it exactly mean?
    It is calculated after sequencing and based on fastq file or after mapping to the genome?

  • #2
    Hello,

    you need to map the reads first to know from what region they (hopefully) comes from.

    One easy way to look for coverage in regions is to design a .bed file (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) with your regions of interest and compare them to the mapped result with bedtools coverageBed (http://code.google.com/p/bedtools/).

    Comment


    • #3
      Once you have coverage in terms of read count (from coverageBed), to get coverage like 5x, you'll have to do

      ( read count * read length ) / length of area in question

      So if you have 5 reads that are 50 bases long in a region that's 100 bases long, your coverage will be

      (5 * 50) / 100 = 2.5x.

      You could calculate a number for the whole genome by adding all the chromosome lengths, or you could do individual chromosomes or genes or windows throughout the genome or whatever is interesting.

      Comment


      • #4
        You could also use "samtools depth" to find the coverage at each position in your target region which like mamons said, you could create a bed file for that.

        Then you can just take all the coverage counts and get a summary using R to get the mean, median etc. depth of coverage in the region of interest

        Comment


        • #5
          Hi,

          has someone of you ever tried to calculate the coverage (using coverageBed) of a whole genome sequencing experiment (about 40x average coverage) on a relatively large amount of genomic features (like all refSeq genes). I tried to perform this task on a multicore processor and 16 GB Ram memory. After three days of calculation and a constant memory consumption of about 14 GB i stopped the process. I used the following command:

          ./coverageBed -abam <bam_file> -b <refSeq_exons.bed6> -hist >> histogram.txt

          Is that normal? Some ideas?

          @aggp11: Which version of samtools are you using? The command depth seems not to be present in my version

          Comment


          • #6
            Hi




            * Added the `depth' command to samtools to compute the per-base depth with a
            simpler interface. File `bam2depth.c', which implements this command, is the
            recommended example on how to use the mpileup APIs.

            Comment


            • #7
              @Mbender: I am using samtools version 0.1.18 .

              @maria_maria & Mbender: with this latest version of samtools, you don't even have to worry about bam2depth. The following command is an example of how samtools depth works:

              samtools depth -q 30 -b exons.bed exome.bam > test_q_20.coverage

              Output:
              chr1 14468 39
              chr1 14469 39
              chr1 14470 37
              chr1 14471 39
              chr1 14472 35
              chr1 14473 34

              Where the third column is the # of q30 or more reads at the given position.

              Thanks,
              Praful

              Comment


              • #8
                Many thanks.

                Using samtools depth seems to calculate the coverage in given genomic regions in a feasible amount of time. By the way, the low performance of coverageBed when working on a large amount of genomic intervals is a known issue.



                Best,

                Matthias

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  Yesterday, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 06:58 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 08:18 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 08:04 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-03-2024, 06:55 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Working...
                X