I have 100 bp paired end illumina hiseq reads, which I mapped with bwa-mem, and then used samtools to sort and rmdups. I am running into errors at the step of removing duplicates, as samtools rmdup seems to be removing reads but not their mates. Here is an example.
These look like duplicate pairs:
(1) HWI-ST1329:206:H7AHFADXX:1:1116:15857:82212 147 chr22 361328 7 77M2I26M = 361354 -77 GTGCTGGGAGAGCTGCTGCTTCTCTGTGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGTGTGCTGGGAGAGCTGCAGCTTGTGTGTGTGTGCTGGGAGAGCTGCTG BBBBBFBB<<0BB7<<<B<<B7BBFBFBBB<BFBBBBBBB<BFBBBFFBBBFBIIIIIFFF<7FFFFIFFFBBFFIIIIFIIIIIIIIIIIIFFFFFFFFFFBBB RG:Z:MP1 NM:i:8 AS:i:68 XS:i:64
(2) HWI-ST1329:206:H7AHFADXX:1:1116:15857:82212 99 chr22 361354 20 105M = 361328 77 TGTGCTGGGAGAGCTGCTGCTTCTCTGTGTGTGTGCTGGGAGAGCTGCAGCTTTTCTGTGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGCTGGGAGAGCTGCAGC BBBFFFFFFFFFFIIIIIIIIIFIIIIFIFFBF<FBFFIIFIFBFFFFFBBFFFFBFFFF<BBFFII<B7BFBFFBBBBB<B<BBBBBBBBBFF<B<BB<B<BBB RG:Z:MP1 NM:i:4 AS:i:85 XS:i:71
(3) HWI-ST1329:208:H7873ADXX:1:1211:6278:83430 147 chr22 361328 5 77M2I26M = 361354 -77 GTGCTGGGAGAGCTGCGGCTTCTCTGTGTGCTGGGAGAGCGGCAGCTTCTCTGTGTGTGTGCTGGGAGAGCTGCAGCTTGTGTGTGTGTGCTGGGAGAGCTGCTG #################BB<B<70<BB<B7''7<070'70''<FFFBBB7'70F<FBFFF<<7BFIFIFFBB'BB<<FFBIIIFIIIIIIIIFFFFFFFFFFBBB RG:Z:MP1 NM:i:9 AS:i:63 XS:i:59
(4) HWI-ST1329:208:H7873ADXX:1:1211:6278:83430 99 chr22 361354 5 25M4D80M = 361328 77 TGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGCTGGGAGAGCTGCAGCTTGTGTGTGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGCTGGGAGAGCTGCAGCTTCT BBBFFFFFFFFFFIIIIFIIIIIFIFIFFFFFFIIIFFFFFFFIIIFFFFIBFFFIFIBFFIBFF7FFBF<<B<<BBB<<BFBBB<<<B77B0<BBBBBFBBBBF RG:Z:MP1 NM:i:6 AS:i:88 XS:i:83
But only (4) is removed, leaving (3) unpaired. Is this a mistake, or is there some reason that only one read in the pair is being removed by samtools rmdup?
Thanks
These look like duplicate pairs:
(1) HWI-ST1329:206:H7AHFADXX:1:1116:15857:82212 147 chr22 361328 7 77M2I26M = 361354 -77 GTGCTGGGAGAGCTGCTGCTTCTCTGTGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGTGTGCTGGGAGAGCTGCAGCTTGTGTGTGTGTGCTGGGAGAGCTGCTG BBBBBFBB<<0BB7<<<B<<B7BBFBFBBB<BFBBBBBBB<BFBBBFFBBBFBIIIIIFFF<7FFFFIFFFBBFFIIIIFIIIIIIIIIIIIFFFFFFFFFFBBB RG:Z:MP1 NM:i:8 AS:i:68 XS:i:64
(2) HWI-ST1329:206:H7AHFADXX:1:1116:15857:82212 99 chr22 361354 20 105M = 361328 77 TGTGCTGGGAGAGCTGCTGCTTCTCTGTGTGTGTGCTGGGAGAGCTGCAGCTTTTCTGTGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGCTGGGAGAGCTGCAGC BBBFFFFFFFFFFIIIIIIIIIFIIIIFIFFBF<FBFFIIFIFBFFFFFBBFFFFBFFFF<BBFFII<B7BFBFFBBBBB<B<BBBBBBBBBFF<B<BB<B<BBB RG:Z:MP1 NM:i:4 AS:i:85 XS:i:71
(3) HWI-ST1329:208:H7873ADXX:1:1211:6278:83430 147 chr22 361328 5 77M2I26M = 361354 -77 GTGCTGGGAGAGCTGCGGCTTCTCTGTGTGCTGGGAGAGCGGCAGCTTCTCTGTGTGTGTGCTGGGAGAGCTGCAGCTTGTGTGTGTGTGCTGGGAGAGCTGCTG #################BB<B<70<BB<B7''7<070'70''<FFFBBB7'70F<FBFFF<<7BFIFIFFBB'BB<<FFBIIIFIIIIIIIIFFFFFFFFFFBBB RG:Z:MP1 NM:i:9 AS:i:63 XS:i:59
(4) HWI-ST1329:208:H7873ADXX:1:1211:6278:83430 99 chr22 361354 5 25M4D80M = 361328 77 TGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGCTGGGAGAGCTGCAGCTTGTGTGTGTGCTGGGAGAGCTGCAGCTTCTCTGTGTGCTGGGAGAGCTGCAGCTTCT BBBFFFFFFFFFFIIIIFIIIIIFIFIFFFFFFIIIFFFFFFFIIIFFFFIBFFFIFIBFFIBFF7FFBF<<B<<BBB<<BFBBB<<<B77B0<BBBBBFBBBBF RG:Z:MP1 NM:i:6 AS:i:88 XS:i:83
But only (4) is removed, leaving (3) unpaired. Is this a mistake, or is there some reason that only one read in the pair is being removed by samtools rmdup?
Thanks
Comment