Dear all,
I am quite new to this community but have already found lots of helpful answers scanning through the threads here. But now I have come to a topic which I didn't found covered yet.
I used samtools rmdup to remove duplicate reads from my bwa alignments. I also analysed the data including duplicates, i.e. with all reads, but want to see the difference upon using duplicate-free .bam files...
Using rmdup, it gives me the following output:
samtools rmdup -s S1_sorted.bam S1_unique.bam
[bam_rmdupse_core] 6479447 / 18248499 = 0.3551 in library ' '
Does this mean that there were initially 18,248,499 reads, and that 6,479,447 were suspected to be duplicate and thus removed?
That would mean, the number of "uniquely" mapped reads should be 18,248,499 - (minus) 6,479,447 = 11,769,052 reads.
Since I am not sure how to interpret the rmdupse shell comment, I thought of another possibility to get the number of uniquely mapped reads:
1) samtools index S1_unique.bam
2) samtools view S1_unique.bam | wc -l 11897943
I.E. just counting the lines of the resulting "unique.bam". However, there is a slight difference between the two numbers (11,897,943 vs 11,769,05).
Anyone able to explain it?
Ideas are very appreciated!
Thanks in advance.
I am quite new to this community but have already found lots of helpful answers scanning through the threads here. But now I have come to a topic which I didn't found covered yet.
I used samtools rmdup to remove duplicate reads from my bwa alignments. I also analysed the data including duplicates, i.e. with all reads, but want to see the difference upon using duplicate-free .bam files...
Using rmdup, it gives me the following output:
samtools rmdup -s S1_sorted.bam S1_unique.bam
[bam_rmdupse_core] 6479447 / 18248499 = 0.3551 in library ' '
Does this mean that there were initially 18,248,499 reads, and that 6,479,447 were suspected to be duplicate and thus removed?
That would mean, the number of "uniquely" mapped reads should be 18,248,499 - (minus) 6,479,447 = 11,769,052 reads.
Since I am not sure how to interpret the rmdupse shell comment, I thought of another possibility to get the number of uniquely mapped reads:
1) samtools index S1_unique.bam
2) samtools view S1_unique.bam | wc -l 11897943
I.E. just counting the lines of the resulting "unique.bam". However, there is a slight difference between the two numbers (11,897,943 vs 11,769,05).
Anyone able to explain it?
Ideas are very appreciated!
Thanks in advance.
Comment