Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coverage precentage of the hg19 using bowtie2

    Hi all,
    i am using bowtie2 for mapping WGS reads to hg19.
    I am trying to find out what is the coverage precentage of the hg19 using my reads.
    For example:
    if the genome siae is 300 bases and my reads cover 30 bases, so my genome coverage is 10%.
    i am looking for the 10%...
    Can i know it from bowtie2 output?
    Any ideas?

    Thanks,
    Pap
    Last edited by papori; 12-06-2013, 03:53 AM.

  • #2
    While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html

    Comment


    • #3
      If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

      You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

      This will give you a quick and dirty estimate.

      samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

      (this will give you the sum of depth values)

      Comment


      • #4
        If you just want to know how many reads are covered, you can use the same tool, as it does not report '0' depth values. You can simply subtract "covered" bases from total bases.

        Comment


        • #5
          Originally posted by jgibbons1 View Post
          If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

          You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

          This will give you a quick and dirty estimate.

          samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

          (this will give you the sum of depth values)
          I want to know how many bases from the reference has hit from my reads.
          I dont care from the depth.
          Is it clearer?
          Thanks

          Comment


          • #6
            Originally posted by GenoMax View Post
            While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html
            This function is great, but the output is huge!!! more than 2TB per bam file..
            This is to much for my storage
            Any solutions?

            Comment


            • #7
              Originally posted by papori View Post
              This function is great, but the output is huge!!! more than 2TB per bam file..
              This is to much for my storage
              Any solutions?
              Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

              Code:
              bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'
              That will occupy no space and be faster since you don't have to write anything to disk.

              Comment


              • #8
                Originally posted by dpryan View Post
                Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

                Code:
                bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'
                That will occupy no space and be faster since you don't have to write anything to disk.
                Is it report 1 for base which have also more than 1 hits?
                I guess yes is the answer, just want to be sure..
                Many Thanks!

                Comment


                • #9
                  It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

                  You'll find awk and the other standard unix tools extremely useful.

                  Comment


                  • #10
                    Originally posted by dpryan View Post
                    It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

                    You'll find awk and the other standard unix tools extremely useful.
                    Thanks!
                    Very Help

                    Comment


                    • #11
                      Hi papori,
                      I am currently doing something similar to you.
                      Could you share one of the example final output from the code?
                      like cov= ,ncov=?
                      The result I got is extremely huge.
                      879370384 hits and 1984311365 no hits. Not sure if I was doing it right.
                      Thanks for your reply!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        Yesterday, 01:16 PM
                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 07:15 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-23-2024, 10:28 AM
                      0 responses
                      15 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-23-2024, 07:35 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-22-2024, 02:06 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X