I downloaded a publicly available bam file and ran samtools flagstat on it. The output is here:
482805326 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
235328156 + 0 mapped (48.74%:nan%)
482805326 + 0 paired in sequencing
241402663 + 0 read1
241402663 + 0 read2
235328156 + 0 properly paired (48.74%:nan%)
235328156 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Next, I extracted reads from the bam into two fasta files. The sum of reads in both was 482805326, so equal as in the bam. I aligned the fastq files again (bwa aln and bwa sampe), and ran samtools flagstat on the resulting bam. Here is the output:
387280125 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
387280125 + 0 mapped (100.00%:nan%)
387280125 + 0 paired in sequencing
194031131 + 0 read1
193248994 + 0 read2
375578347 + 0 properly paired (96.98%:nan%)
381759377 + 0 with itself and mate mapped
5520748 + 0 singletons (1.43%:nan%)
5108005 + 0 with mate mapped to a different chr
4124551 + 0 with mate mapped to a different chr (mapQ>=5)
And what I don't understand is where is almost 100M reads gone? I haven't found any information on bwa removing unmapped/bad quality/duplicate reads, so I am wondering. Any hints?
I am using bwa 0.5.9-r16
482805326 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
235328156 + 0 mapped (48.74%:nan%)
482805326 + 0 paired in sequencing
241402663 + 0 read1
241402663 + 0 read2
235328156 + 0 properly paired (48.74%:nan%)
235328156 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Next, I extracted reads from the bam into two fasta files. The sum of reads in both was 482805326, so equal as in the bam. I aligned the fastq files again (bwa aln and bwa sampe), and ran samtools flagstat on the resulting bam. Here is the output:
387280125 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
387280125 + 0 mapped (100.00%:nan%)
387280125 + 0 paired in sequencing
194031131 + 0 read1
193248994 + 0 read2
375578347 + 0 properly paired (96.98%:nan%)
381759377 + 0 with itself and mate mapped
5520748 + 0 singletons (1.43%:nan%)
5108005 + 0 with mate mapped to a different chr
4124551 + 0 with mate mapped to a different chr (mapQ>=5)
And what I don't understand is where is almost 100M reads gone? I haven't found any information on bwa removing unmapped/bad quality/duplicate reads, so I am wondering. Any hints?
I am using bwa 0.5.9-r16
Comment