I am using rmdup on a set of paired end RAD reads and start with a .bam file that contains a number of sets of duplicates like:
a1_1..a2_3
a2_1..a2_3
..
aN_1..aN_3
where all the a*_1 are the same and a*_3 are the same.
After running rmdup half the duplicates in the .bam file will contain a1_1 and all of the a*_3 reads. The other half of the duplicates the .bam file contains a1_3 and all the a*_1
It seems rmdup should throw away both sets of duplicate reads leaving me one first read and one second read. Shouldn't it?
Thanks
Adam
a1_1..a2_3
a2_1..a2_3
..
aN_1..aN_3
where all the a*_1 are the same and a*_3 are the same.
After running rmdup half the duplicates in the .bam file will contain a1_1 and all of the a*_3 reads. The other half of the duplicates the .bam file contains a1_3 and all the a*_1
It seems rmdup should throw away both sets of duplicate reads leaving me one first read and one second read. Shouldn't it?
Thanks
Adam