Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I determine the mapping rates of tophat output such as accepted_hits.bam?

    I used TopHat to run the same RNA-Seq data with different -r/--mate-inner-dist and --mate-std-dev.

    Here are the parameters:
    1. -r 160, --mate-std-dev (default) 20
    2. -r (default) 50, --mate-std-dev (default) 20
    3. -r 0, --mate-std-dev 60

    After the TopHat runned, I used the samtools flagstat to estimates the results.

    The results are listed below in order:
    1.-r 160, --mate-std-dev (default) 20
    Code:
    27139030 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    27139030 + 0 mapped (100.00%:-nan%)
    27139030 + 0 paired in sequencing
    14171642 + 0 read1
    12967388 + 0 read2
    22063409 + 0 properly paired (81.30%:-nan%)
    24154960 + 0 with itself and mate mapped
    2984070 + 0 singletons (11.00%:-nan%)
    516422 + 0 with mate mapped to a different chr
    217580 + 0 with mate mapped to a different chr (mapQ>=5)
    4141901 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    4141901 + 2091 paired in sequencing
    1533088 + 997 read1
    2608813 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    2.-r (default) 50, --mate-std-dev (default) 20
    Code:
    27639199 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    27639199 + 0 mapped (100.00%:-nan%)
    27639199 + 0 paired in sequencing
    14422450 + 0 read1
    13216749 + 0 read2
    21085751 + 0 properly paired (76.29%:-nan%)
    24654856 + 0 with itself and mate mapped
    2984343 + 0 singletons (10.80%:-nan%)
    706460 + 0 with mate mapped to a different chr
    215918 + 0 with mate mapped to a different chr (mapQ>=5)
    4142842 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    4142842 + 2091 paired in sequencing
    1533869 + 997 read1
    2608973 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    3. -r 0, --mate-std-dev 60
    Code:
    41145664 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    41145664 + 0 mapped (100.00%:-nan%)
    41145664 + 0 paired in sequencing
    21422982 + 0 read1
    19722682 + 0 read2
    22975306 + 0 properly paired (55.84%:-nan%)
    37774543 + 0 with itself and mate mapped
    3371121 + 0 singletons (8.19%:-nan%)
    10967682 + 0 with mate mapped to a different chr
    207758 + 0 with mate mapped to a different chr (mapQ>=5)
    2934463 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    2934463 + 2091 paired in sequencing
    906826 + 997 read1
    2027637 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    As the total input reads of the sample were 31387112, so at first I felt confusing about the result 3, because the total output reads of accepted_hits.bam were much more than the total input reads.

    After I checked the bam file, I found there were lots of repeats because of the multihits.

    So the results I've got from the samtools flagstat were not that accurate.
    Is there any way to estimates the mapping rates and unique mapping rates or anything else?

    Hoping for your help!

  • #2
    Try these to see if they suit your needs.

    BAMStats: http://bamstats.sourceforge.net/

    Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats

    Comment


    • #3
      Originally posted by lucyyang1991 View Post
      After I checked the bam file, I found there were lots of repeats because of the multihits.

      So the results I've got from the samtools flagstat were not that accurate.
      Is there any way to estimates the mapping rates and unique mapping rates or anything else?
      Hi- A quick shortcut to get the mapping rate is to count reads in the bam file where tophat puts the unmapped reads, called unmapped.bam or something like that. Your mapping rate than would be (tot reads - reads in unmapped.bam)/tot reads. For uniquely mapped reads you could use the mapq score if tophat sets correctly to reflect uniqueness of mapping.

      Dario

      Comment


      • #4
        Originally posted by GenoMax View Post
        Try these to see if they suit your needs.

        BAMStats: http://bamstats.sourceforge.net/
        Thanks a lot for your help!
        I've downloaded the BAMStats. After I unzip the 'BAMStats-1.25-src.zip', I couldn't find the 'BAMStats-GUI-1.25.jar' and didn't know how to use the program even when I was told to run
        Code:
        java -Xmx4g -jar BAMStats-1.25.jar -i <bam file>
        .

        Comment


        • #5
          Originally posted by GenoMax View Post
          Try these to see if they suit your needs.

          BAMStats: http://bamstats.sourceforge.net/

          Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats
          Hi,
          I've tried all the methods, and find out that BamUtil give the same result with samtools flagstat. So, I still can't estimate which parameter is better because they just can't rule out the repeats due to multihits.

          Comment


          • #6
            If you are only interested in uniquely mapped reads then see post #14 in this thread: http://seqanswers.com/forums/showthread.php?t=25096

            Here is one more option for summarizing read mappings: http://bioinf.wehi.edu.au/featureCounts/

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Understanding Genetic Influence on Infectious Disease
              by seqadmin




              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
              09-09-2024, 10:59 AM
            • seqadmin
              Addressing Off-Target Effects in CRISPR Technologies
              by seqadmin






              The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
              08-27-2024, 04:44 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 06:25 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 01:02 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-18-2024, 06:39 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-11-2024, 02:44 PM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Working...
            X