Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAMtools flagstat output interpretation

    Hi,

    I got the following info after running a samtools flagstat on a Novoalign bam file:

    126597089 in total
    0 QC failure
    0 duplicates
    122446987 mapped (96.72%)
    126597089 paired in sequencing
    63372478 read1
    63224611 read2
    104053862 properly paired (82.19%)
    118953502 with itself and mate mapped
    3493485 singletons (2.76%)
    14745930 with mate mapped to a different chr
    8838136 with mate mapped to a different chr (mapQ>=5)

    what does the line no. 5 signify? and since these are paired reads, shouldn't read1 and read2 numbers be the same? does it have anything to do with using the "-r A" option for sam generation?

    Thanks in advance.

  • #2
    What kind of aligner do you use? You have paired-end data, so you can have your mate pairs aligned as pairs (and this is the proper way) --> this is number in line 5, or independently --> and is number for "singletons". So for left and right mate pairs you summarize number of alignments within a pair and number of independent alignments -- this is the "... read1" and "... read2".

    Comment


    • #3
      I don't think Flagstat isn't all that smart. It's just reading the flags. All 126597089 reads you gave it are flagged as being paired, so it's relaying that info. bam entries are also flagged as to whether they came from read1 or read 2, and flagstat is just telling you what it sees. Probably, your bam went through some kind of quality filtering where more read2 reads were filtered away than read1 reads. That makes sense experimentally, and the flags wouldn't necessarily change as a result of that. When I use bwa and samtools, after running rmdup, I get different numbers of read 1 and read2 reads as well. I'm guessing that rmdup is also doing some kind of quality filtereing too. The file that I put into rmdup has the same numebr of read 1 and read2 reads.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin


        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
        Yesterday, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      39 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      41 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      35 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      55 views
      0 likes
      Last Post seqadmin  
      Working...
      X