Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools flagstat output

    Hi,
    I mapp pair-end reads to a reference genome (stampy) then I convert SAM file into BAM and finally I get some stats using the samtools flagstat.
    I get a file with a content similar to:
    4198456 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    4022089 + 0 mapped (95.80%:-nan%)
    4198456 + 0 paired in sequencing
    2099228 + 0 read1
    2099228 + 0 read2
    3796446 + 0 properly paired (90.42%:-nan%)
    4013692 + 0 with itself and mate mapped
    8397 + 0 singletons (0.20%:-nan%)
    167574 + 0 with mate mapped to a different chr
    72008 + 0 with mate mapped to a different chr (mapQ>=5)

    I don't understand the detailed explanation of samtools flagstat output.
    * What does "singleton" mean ? This means that one reads of the paired end reads is mapped to the genome
    * What "with itself and mate mapped" mean? It means that both reads mapped to genome

  • #2
    Hi did you find something?
    i have the same questions here.
    thanks
    tuka

    Comment


    • #3
      Both of your answers are correct.This forum might shed some light on other questions about the output that you may have.

      If you got samflags and want to know their meaning quickly then you can check their meaning interactively following this link. You type in a flag and will get the answer at Piacrd official site. Yo…


      I hope this helps!

      Comment


      • #4
        samtools flagstat dead?

        Originally posted by twaddlac View Post
        Both of your answers are correct.This forum might shed some light on other questions about the output that you may have.

        If you got samflags and want to know their meaning quickly then you can check their meaning interactively following this link. You type in a flag and will get the answer at Piacrd official site. Yo…


        I hope this helps!


        I look now in the samtools manual and there is no longer a flagstat option to use for estimating these stats. Can anyone tell me why? I used it earlier to see how 'well' paired-end reads aligned to the genome looking at the 'properly-paired %' statistic which was a part of the samtools flagstat output.

        Does anyone know why? Thanks

        Comment


        • #5
          The manual (at http://samtools.sourceforge.net/samtools.shtml ) does not document flagstat, but it is realy there ...

          -bash-3.00$ ~/samtools-0.1.18/samtools flagstat
          Usage: samtools flagstat <in.bam>

          The "NEWS" files says flagstat was added in ver 0.1.3 on 15 April, 2009

          Comment


          • #6
            assessing how -r affects Tophat output

            Originally posted by Richard Finney View Post
            The manual (at http://samtools.sourceforge.net/samtools.shtml ) does not document flagstat, but it is realy there ...

            -bash-3.00$ ~/samtools-0.1.18/samtools flagstat
            Usage: samtools flagstat <in.bam>

            The "NEWS" files says flagstat was added in ver 0.1.3 on 15 April, 2009

            Thanks for that Richard! Maybe you will be able to help me with my problem!

            So to briefly outline what I did - I wanted to see how using different -r options for Tophat2 will affect my alignment (part of an RNASeq study).

            I initially took a subset (5 million) 101 bp long paired-end reads from 4 control and 4 disease samples and mapped them to the ref transcriptome using Bowtie2.

            On doing so, I then used picard tools on the output sam files to first sort them and then estimate the insert size statistics. This gave me the mean and standard deviation fragment length based on the alignment so I had to subtract twice the read length to get Tophat's 'inner distance' between pairs option value.

            So what for this analysis was I took the average of the 4 control means, the average of the 4 disease means (-15 and -33 respectively) and a high and low extreme value (-50 and +50) just to see how it would affect my alignment. I chose a common std deviation of 55 and aligned 1 chosen disease sample to the ref genome using Tophat2 and these 4 different -r values, each a single run.

            Coming back to this thread's topic, I then used samtools flagstat to evaluate how well the -r option worked for the alignment looking at the 'properly-paired %' stat which is part of the output (I read that this is a common procedure I'm not 100% sure if it's valid).

            Quite to my surprise, the mean disease -r I mentioned earlier (-33) gave a % of only 82 while the high extreme value of +50 gave the highest % of 92. Why is this??? +50 is nowhere close to the mean I had estimated using picard tools.

            Please do help and I really appreciate your prompt response thus far.
            Last edited by vkartha; 06-07-2012, 05:59 AM.

            Comment


            • #7
              My guess ...
              The larger expected mate inner distance (tophat -r parameter) allows it to look farther out in order to align the weaker of the two pairs. The result is more alignments.

              Comment


              • #8
                Originally posted by Richard Finney View Post
                My guess ...
                The larger expected mate inner distance (tophat -r parameter) allows it to look farther out in order to align the weaker of the two pairs. The result is more alignments.
                What do you mean by "weaker of the two pairs"? And if that is the case - how would I estimate what the right -r value to use would be, given it's so far off from what the actual estimated mean is? Using that with Tophat just doesn't make sense when it expects the 'mean' inner distance between mate pairs

                Comment


                • #9
                  I haven't used tophat for several years though was impressed when it first came out.
                  It might be doing a strategy of if one pair has a perfect match, look nearby with a range for the to place the other "weaker" pair which is not perfect matched. If the range is bigger (i.e. the expected distance is bigger), it is more likely to place the second pair. This is speculation, I don't know what strategy it uses. This is a bigger deal with rna-seq and gene exon models where you'll have exon skipping.

                  Comment


                  • #10
                    samtools flagstat output explanation

                    Hi

                    I am trying to find a detailed explanation to the samtools flagstat output, without success. Even in http://samtools.sourceforge.net/ there is no mention to the flagstat command...

                    Any help?

                    Thanks in advance.

                    Paulo

                    Comment


                    • #11
                      I think that the third Google hit I get is rather good.



                      If you have a specific question about flagstat then ask it. I agree that the samtools doc itself should talk more about flagstat. Perhaps the developers thought that the output was too obvious to mention?

                      Comment


                      • #12
                        If you have gone through the trouble of writing code to produce properly formatted SAM output for paired end alignments then, and only then, is the flagstat output obvious.
                        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                        Salk Institute for Biological Studies, La Jolla, CA, USA */

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X