Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAMtools flagstat output interpretation

    Hi,

    I got the following info after running a samtools flagstat on a Novoalign bam file:

    126597089 in total
    0 QC failure
    0 duplicates
    122446987 mapped (96.72%)
    126597089 paired in sequencing
    63372478 read1
    63224611 read2
    104053862 properly paired (82.19%)
    118953502 with itself and mate mapped
    3493485 singletons (2.76%)
    14745930 with mate mapped to a different chr
    8838136 with mate mapped to a different chr (mapQ>=5)

    what does the line no. 5 signify? and since these are paired reads, shouldn't read1 and read2 numbers be the same? does it have anything to do with using the "-r A" option for sam generation?

    Thanks in advance.

  • #2
    What kind of aligner do you use? You have paired-end data, so you can have your mate pairs aligned as pairs (and this is the proper way) --> this is number in line 5, or independently --> and is number for "singletons". So for left and right mate pairs you summarize number of alignments within a pair and number of independent alignments -- this is the "... read1" and "... read2".

    Comment


    • #3
      I don't think Flagstat isn't all that smart. It's just reading the flags. All 126597089 reads you gave it are flagged as being paired, so it's relaying that info. bam entries are also flagged as to whether they came from read1 or read 2, and flagstat is just telling you what it sees. Probably, your bam went through some kind of quality filtering where more read2 reads were filtered away than read1 reads. That makes sense experimentally, and the flags wouldn't necessarily change as a result of that. When I use bwa and samtools, after running rmdup, I get different numbers of read 1 and read2 reads as well. I'm guessing that rmdup is also doing some kind of quality filtereing too. The file that I put into rmdup has the same numebr of read 1 and read2 reads.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 05-14-2024, 07:03 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-10-2024, 06:35 AM
      0 responses
      44 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-09-2024, 02:46 PM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Working...
      X