No announcement yet.

SAMtools flagstat output interpretation

  • Filter
  • Time
  • Show
Clear All
new posts

  • SAMtools flagstat output interpretation


    I got the following info after running a samtools flagstat on a Novoalign bam file:

    126597089 in total
    0 QC failure
    0 duplicates
    122446987 mapped (96.72%)
    126597089 paired in sequencing
    63372478 read1
    63224611 read2
    104053862 properly paired (82.19%)
    118953502 with itself and mate mapped
    3493485 singletons (2.76%)
    14745930 with mate mapped to a different chr
    8838136 with mate mapped to a different chr (mapQ>=5)

    what does the line no. 5 signify? and since these are paired reads, shouldn't read1 and read2 numbers be the same? does it have anything to do with using the "-r A" option for sam generation?

    Thanks in advance.

  • #2
    What kind of aligner do you use? You have paired-end data, so you can have your mate pairs aligned as pairs (and this is the proper way) --> this is number in line 5, or independently --> and is number for "singletons". So for left and right mate pairs you summarize number of alignments within a pair and number of independent alignments -- this is the "... read1" and "... read2".


    • #3
      I don't think Flagstat isn't all that smart. It's just reading the flags. All 126597089 reads you gave it are flagged as being paired, so it's relaying that info. bam entries are also flagged as to whether they came from read1 or read 2, and flagstat is just telling you what it sees. Probably, your bam went through some kind of quality filtering where more read2 reads were filtered away than read1 reads. That makes sense experimentally, and the flags wouldn't necessarily change as a result of that. When I use bwa and samtools, after running rmdup, I get different numbers of read 1 and read2 reads as well. I'm guessing that rmdup is also doing some kind of quality filtereing too. The file that I put into rmdup has the same numebr of read 1 and read2 reads.