Hi,
I am filtering a bam file using samtools (and picard for duplicates) and am struggling to understand the results from flagstat (apologies in advance for the long post)
This is my sorted bam file before any filtering:
Now, if I filter with -f2 to keep only those reads mapping in proper pair, flagstsat gives this:
Notice how I get the same number of read 1 and read 2, which makes sense to me.
But if I then run -q10 to remove any reads that have mapq<10, I get this:
(btw, I get the exact same result if doing the flag and mapq filters in the same command)
If I'm not doing something incredibly stupid here, my first question is this: I'm getting a different number of read1 and read2 so I'm assuming samtools is removing only the reads that fail the quality filter and not their mate? Is there a way to tell it to remove those? i can't use flags at this stage as they are still labelled as properly paired even though the mate has been removed.
I am filtering a bam file using samtools (and picard for duplicates) and am struggling to understand the results from flagstat (apologies in advance for the long post)
This is my sorted bam file before any filtering:
Code:
samtools flagstat Sample.sorted.bam 86744514 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 16772228 + 0 mapped (19.34%:nan%) 86744514 + 0 paired in sequencing 43372257 + 0 read1 43372257 + 0 read2 14664488 + 0 properly paired (16.91%:nan%) 15934998 + 0 with itself and mate mapped 837230 + 0 singletons (0.97%:nan%) 833002 + 0 with mate mapped to a different chr 317794 + 0 with mate mapped to a different chr (mapQ>=5)
Code:
samtools flagstat Sample.sorted.mapped.bam 14664488 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 14664488 + 0 mapped (100.00%:nan%) 14664488 + 0 paired in sequencing 7332244 + 0 read1 7332244 + 0 read2 14664488 + 0 properly paired (100.00%:nan%) 14664488 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
But if I then run -q10 to remove any reads that have mapq<10, I get this:
Code:
samtools flagstat Sample.sorted.mapped_qualfiltered.bam 10803413 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 10803413 + 0 mapped (100.00%:nan%) 10803413 + 0 paired in sequencing 5398606 + 0 read1 5404807 + 0 read2 10803413 + 0 properly paired (100.00%:nan%) 10803413 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
If I'm not doing something incredibly stupid here, my first question is this: I'm getting a different number of read1 and read2 so I'm assuming samtools is removing only the reads that fail the quality filter and not their mate? Is there a way to tell it to remove those? i can't use flags at this stage as they are still labelled as properly paired even though the mate has been removed.
Comment