Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I determine the mapping rates of tophat output such as accepted_hits.bam?

    I used TopHat to run the same RNA-Seq data with different -r/--mate-inner-dist and --mate-std-dev.

    Here are the parameters:
    1. -r 160, --mate-std-dev (default) 20
    2. -r (default) 50, --mate-std-dev (default) 20
    3. -r 0, --mate-std-dev 60

    After the TopHat runned, I used the samtools flagstat to estimates the results.

    The results are listed below in order:
    1.-r 160, --mate-std-dev (default) 20
    Code:
    27139030 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    27139030 + 0 mapped (100.00%:-nan%)
    27139030 + 0 paired in sequencing
    14171642 + 0 read1
    12967388 + 0 read2
    22063409 + 0 properly paired (81.30%:-nan%)
    24154960 + 0 with itself and mate mapped
    2984070 + 0 singletons (11.00%:-nan%)
    516422 + 0 with mate mapped to a different chr
    217580 + 0 with mate mapped to a different chr (mapQ>=5)
    4141901 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    4141901 + 2091 paired in sequencing
    1533088 + 997 read1
    2608813 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    2.-r (default) 50, --mate-std-dev (default) 20
    Code:
    27639199 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    27639199 + 0 mapped (100.00%:-nan%)
    27639199 + 0 paired in sequencing
    14422450 + 0 read1
    13216749 + 0 read2
    21085751 + 0 properly paired (76.29%:-nan%)
    24654856 + 0 with itself and mate mapped
    2984343 + 0 singletons (10.80%:-nan%)
    706460 + 0 with mate mapped to a different chr
    215918 + 0 with mate mapped to a different chr (mapQ>=5)
    4142842 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    4142842 + 2091 paired in sequencing
    1533869 + 997 read1
    2608973 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    3. -r 0, --mate-std-dev 60
    Code:
    41145664 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    41145664 + 0 mapped (100.00%:-nan%)
    41145664 + 0 paired in sequencing
    21422982 + 0 read1
    19722682 + 0 read2
    22975306 + 0 properly paired (55.84%:-nan%)
    37774543 + 0 with itself and mate mapped
    3371121 + 0 singletons (8.19%:-nan%)
    10967682 + 0 with mate mapped to a different chr
    207758 + 0 with mate mapped to a different chr (mapQ>=5)
    2934463 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    2934463 + 2091 paired in sequencing
    906826 + 997 read1
    2027637 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    As the total input reads of the sample were 31387112, so at first I felt confusing about the result 3, because the total output reads of accepted_hits.bam were much more than the total input reads.

    After I checked the bam file, I found there were lots of repeats because of the multihits.

    So the results I've got from the samtools flagstat were not that accurate.
    Is there any way to estimates the mapping rates and unique mapping rates or anything else?

    Hoping for your help!

  • #2
    Try these to see if they suit your needs.

    BAMStats: http://bamstats.sourceforge.net/

    Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats

    Comment


    • #3
      Originally posted by lucyyang1991 View Post
      After I checked the bam file, I found there were lots of repeats because of the multihits.

      So the results I've got from the samtools flagstat were not that accurate.
      Is there any way to estimates the mapping rates and unique mapping rates or anything else?
      Hi- A quick shortcut to get the mapping rate is to count reads in the bam file where tophat puts the unmapped reads, called unmapped.bam or something like that. Your mapping rate than would be (tot reads - reads in unmapped.bam)/tot reads. For uniquely mapped reads you could use the mapq score if tophat sets correctly to reflect uniqueness of mapping.

      Dario

      Comment


      • #4
        Originally posted by GenoMax View Post
        Try these to see if they suit your needs.

        BAMStats: http://bamstats.sourceforge.net/
        Thanks a lot for your help!
        I've downloaded the BAMStats. After I unzip the 'BAMStats-1.25-src.zip', I couldn't find the 'BAMStats-GUI-1.25.jar' and didn't know how to use the program even when I was told to run
        Code:
        java -Xmx4g -jar BAMStats-1.25.jar -i <bam file>
        .

        Comment


        • #5
          Originally posted by GenoMax View Post
          Try these to see if they suit your needs.

          BAMStats: http://bamstats.sourceforge.net/

          Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats
          Hi,
          I've tried all the methods, and find out that BamUtil give the same result with samtools flagstat. So, I still can't estimate which parameter is better because they just can't rule out the repeats due to multihits.

          Comment


          • #6
            If you are only interested in uniquely mapped reads then see post #14 in this thread: http://seqanswers.com/forums/showthread.php?t=25096

            Here is one more option for summarizing read mappings: http://bioinf.wehi.edu.au/featureCounts/

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Exploring the Dynamics of the Tumor Microenvironment
              by seqadmin




              The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
              07-08-2024, 03:19 PM
            • seqadmin
              Exploring Human Diversity Through Large-Scale Omics
              by seqadmin


              In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
              06-25-2024, 06:43 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 07-10-2024, 07:30 AM
            0 responses
            26 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-03-2024, 09:45 AM
            0 responses
            201 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-03-2024, 08:54 AM
            0 responses
            212 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 07-02-2024, 03:00 PM
            0 responses
            193 views
            0 likes
            Last Post seqadmin  
            Working...
            X