Howdy,
I've recently noticed that I seem to getting a fair number of false positive SNP calls due to do reads with unmapped mates that appear to be duplicates (the reads all start and end at the same position). Such duplicates do not appear to be removed by samtools rmdup, even with the -s or -S options. I'm interested in doing a bit of my own .bam cleaning by filtering out all reads with unmapped mates, as well as reads that have the duplicate flag set, reads with zero mapping quality, and reads that don't pass vendor quality checks. Anyone else tried / quantified the effects of this? Seems like common sense, but I'd be curious to hear other people's thoughts. Any reason not to do this?
I've recently noticed that I seem to getting a fair number of false positive SNP calls due to do reads with unmapped mates that appear to be duplicates (the reads all start and end at the same position). Such duplicates do not appear to be removed by samtools rmdup, even with the -s or -S options. I'm interested in doing a bit of my own .bam cleaning by filtering out all reads with unmapped mates, as well as reads that have the duplicate flag set, reads with zero mapping quality, and reads that don't pass vendor quality checks. Anyone else tried / quantified the effects of this? Seems like common sense, but I'd be curious to hear other people's thoughts. Any reason not to do this?