Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coverage precentage of the hg19 using bowtie2

    Hi all,
    i am using bowtie2 for mapping WGS reads to hg19.
    I am trying to find out what is the coverage precentage of the hg19 using my reads.
    For example:
    if the genome siae is 300 bases and my reads cover 30 bases, so my genome coverage is 10%.
    i am looking for the 10%...
    Can i know it from bowtie2 output?
    Any ideas?

    Thanks,
    Pap
    Last edited by papori; 12-06-2013, 03:53 AM.

  • #2
    While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html

    Comment


    • #3
      If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

      You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

      This will give you a quick and dirty estimate.

      samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

      (this will give you the sum of depth values)

      Comment


      • #4
        If you just want to know how many reads are covered, you can use the same tool, as it does not report '0' depth values. You can simply subtract "covered" bases from total bases.

        Comment


        • #5
          Originally posted by jgibbons1 View Post
          If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

          You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

          This will give you a quick and dirty estimate.

          samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

          (this will give you the sum of depth values)
          I want to know how many bases from the reference has hit from my reads.
          I dont care from the depth.
          Is it clearer?
          Thanks

          Comment


          • #6
            Originally posted by GenoMax View Post
            While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html
            This function is great, but the output is huge!!! more than 2TB per bam file..
            This is to much for my storage
            Any solutions?

            Comment


            • #7
              Originally posted by papori View Post
              This function is great, but the output is huge!!! more than 2TB per bam file..
              This is to much for my storage
              Any solutions?
              Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

              Code:
              bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'
              That will occupy no space and be faster since you don't have to write anything to disk.

              Comment


              • #8
                Originally posted by dpryan View Post
                Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

                Code:
                bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'
                That will occupy no space and be faster since you don't have to write anything to disk.
                Is it report 1 for base which have also more than 1 hits?
                I guess yes is the answer, just want to be sure..
                Many Thanks!

                Comment


                • #9
                  It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

                  You'll find awk and the other standard unix tools extremely useful.

                  Comment


                  • #10
                    Originally posted by dpryan View Post
                    It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

                    You'll find awk and the other standard unix tools extremely useful.
                    Thanks!
                    Very Help

                    Comment


                    • #11
                      Hi papori,
                      I am currently doing something similar to you.
                      Could you share one of the example final output from the code?
                      like cov= ,ncov=?
                      The result I got is extremely huge.
                      879370384 hits and 1984311365 no hits. Not sure if I was doing it right.
                      Thanks for your reply!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Genetic Variation in Immunogenetics and Antibody Diversity
                        by seqadmin



                        The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                        11-06-2024, 07:24 PM
                      • seqadmin
                        Choosing Between NGS and qPCR
                        by seqadmin



                        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                        10-18-2024, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 11:09 AM
                      0 responses
                      23 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Today, 06:13 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 11-01-2024, 06:09 AM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-30-2024, 05:31 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X