Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coverage precentage of the hg19 using bowtie2

    Hi all,
    i am using bowtie2 for mapping WGS reads to hg19.
    I am trying to find out what is the coverage precentage of the hg19 using my reads.
    For example:
    if the genome siae is 300 bases and my reads cover 30 bases, so my genome coverage is 10%.
    i am looking for the 10%...
    Can i know it from bowtie2 output?
    Any ideas?

    Thanks,
    Pap
    Last edited by papori; 12-06-2013, 03:53 AM.

  • #2
    While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html

    Comment


    • #3
      If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

      You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

      This will give you a quick and dirty estimate.

      samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

      (this will give you the sum of depth values)

      Comment


      • #4
        If you just want to know how many reads are covered, you can use the same tool, as it does not report '0' depth values. You can simply subtract "covered" bases from total bases.

        Comment


        • #5
          Originally posted by jgibbons1 View Post
          If I understand you correctly, you'd like to know the average coverage of your read set. Is that right?

          You may consider using the samtools depth function. This calculates depth for every base of the reference. You can then take the sum of the depth value and divide by non 'N' base pairs in hg19.

          This will give you a quick and dirty estimate.

          samtools depth you.file.bam | awk '{sum=sum+$3} END {print sum}'

          (this will give you the sum of depth values)
          I want to know how many bases from the reference has hit from my reads.
          I dont care from the depth.
          Is it clearer?
          Thanks

          Comment


          • #6
            Originally posted by GenoMax View Post
            While it probably won't give you the gross % value you are looking for "genomecov" from BEDTools would be a good starting point to experiment: http://bedtools.readthedocs.org/en/l...genomecov.html
            This function is great, but the output is huge!!! more than 2TB per bam file..
            This is to much for my storage
            Any solutions?

            Comment


            • #7
              Originally posted by papori View Post
              This function is great, but the output is huge!!! more than 2TB per bam file..
              This is to much for my storage
              Any solutions?
              Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

              Code:
              bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'
              That will occupy no space and be faster since you don't have to write anything to disk.

              Comment


              • #8
                Originally posted by dpryan View Post
                Just don't store the output then. If you just want to know how many bases have a read aligning to them, just use the "-d" option and then pipe the output to awk:

                Code:
                bedtools genomecov ...stuff.. | awk '{if($3==0) { nocov+=1;}else{cov+=1}}END{printf("%i have hits and %i do not\n",cov,nocov)}'
                That will occupy no space and be faster since you don't have to write anything to disk.
                Is it report 1 for base which have also more than 1 hits?
                I guess yes is the answer, just want to be sure..
                Many Thanks!

                Comment


                • #9
                  It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

                  You'll find awk and the other standard unix tools extremely useful.

                  Comment


                  • #10
                    Originally posted by dpryan View Post
                    It does. A base with a million reads mapping to it counts the same as another base with only a single read covering it.

                    You'll find awk and the other standard unix tools extremely useful.
                    Thanks!
                    Very Help

                    Comment


                    • #11
                      Hi papori,
                      I am currently doing something similar to you.
                      Could you share one of the example final output from the code?
                      like cov= ,ncov=?
                      The result I got is extremely huge.
                      879370384 hits and 1984311365 no hits. Not sure if I was doing it right.
                      Thanks for your reply!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      14 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      43 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X