Hi,
Does anyone know any alternatives to Picard mark duplicates that is capable of handling split-reads in RNA-seq data?
I have a gsnap bam file that has been aligned and a read that it split is represented on two different lines (as indicates by a 'XT' flag. This ends up throwing an error when using Picard mark duplicates:
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 6: test:M00569:47:000000000-A50YA:1:1107:13835:21068
The culprit lines in the bam file:
How do you typically handle duplicates in RNA-seq? Do you just use samtools rmdup?
Thanks,
Does anyone know any alternatives to Picard mark duplicates that is capable of handling split-reads in RNA-seq data?
I have a gsnap bam file that has been aligned and a read that it split is represented on two different lines (as indicates by a 'XT' flag. This ends up throwing an error when using Picard mark duplicates:
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 6: test:M00569:47:000000000-A50YA:1:1107:13835:21068
The culprit lines in the bam file:
Code:
M00569:47:000000000-A50YA:1:1107:13835:21068 65 chr2 133036616 39 32H119M chr7 99156283 0 CTGCGGTTCCTCTCGTACTGAGCAGGATTACCATGGCAACAACACATCATCAGTAGGGTAAAACTAACCTGTCTCACGACGGTCTAAACCCAGCTCACGTTCCCTATTAGTGGGTGAAC GHGFEEEEFFFHEHG2EGHGGCAD3FAGFDFGGFFEG3GFHHFFFHHGGHFHHFHFFF?3FGGGHFHHFHFDGFFHHFGGGGGFGGFFHFGGEHGHHGAGHHBHGHHHFGHF0GEDHHF RG:Z:HEKRCOR1_clone1 MD:Z:T3CA8A1G17T11CATCAC1G23T3A6T9A1G18 NH:i:1 HI:i:1 NM:i:18 XW:i:6 XV:i:12 SM:i:39 XQ:i:40 X2:i:0 XS:A:- XT:Z:GT-AG,0.87,0.98 M00569:47:000000000-A50YA:1:1107:13835:21068 81 chr4 76807259 39 119H32M chr7 99156283 0 GTTCAGACATTTGGTGTATGTGCTTGGCTGAG GFFFD6GFDAGGGGGGFFGG4AAF?FFBBBBB RG:Z:HEKRCOR1_clone1 MD:Z:11C17A2 NH:i:1 HI:i:1 NM:i:2 XW:i:0 XV:i:2 SM:i:39 XQ:i:40 X2:i:0 XS:A:+ XT:Z:GT-AG,0.87,0.98 M00569:47:000000000-A50YA:1:1107:13835:21068 129 chr7 99156283 38 119M32S chr2 133036616 0 CGTCGCCGGCAGTGCCACCGAGAAGCGCCGGCCTCGGGGCTGCCTACAGCGGCCCGGGAGAGGCTGTGGTGGGCCCGCGCGCGCGTGCGTAGGTGACAGGACACCGGCCGGGCCCGCCCTGGATAACTGGCTTGGGGCGGCCAAGCGTTCC A?AAADAD10>>GGBBBF0EC0A/F1/AA/AEE/0E//>>/>/>1B0BBF@EG/E@/</</?/B/00BBFFC//?<////<---<-<.;E.-.<0CC00...9..;9-A-;@->--;@---AFF/B////;B---;@-@----99----9- RG:Z:HEKRCOR1_clone1 MD:Z:9A32T29C4A5A3T15G5A9 NH:i:1 HI:i:1 NM:i:8 XW:i:8 XV:i:0 SM:i:38 XQ:i:40 X2:i:0 PG:Z:T
How do you typically handle duplicates in RNA-seq? Do you just use samtools rmdup?
Thanks,
Comment