Hi,
I have merged some bam alignment files of different samples having read group information using picard's MergeSamFiles program.
But when I check the merged file, I see that in the merged file reads are duplicated and the pair information is corrupted.
To explain better:
Before merging, the reads information in sam file looks like this.
[Forward Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 99 chr2 152506978 60 75M = 152507022 119 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT CBCFFFFFHHHHHJJJJJJJJJJJJJJJIJJJJJIJJJJIJIJIJJIIEIIIIIIIJJIJJJJGIFIIIIJIIII X0:i:1 X1:i:0 MD:Z:62A12 RG:Z:S72 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
[Reverse Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 147 chr2 152507022 60 75M = 152506978 -119 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG HGGJJJJJJGIHGIGGJHIIIJIIGIGJJJIIJJJJIIGIIHGJIIJJJJJJJJIJJJJJIIGHHHHFEDFFCCB X0:i:1 X1:i:0 MD:Z:18A40A15 RG:Z:S72 XG:i:0 AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 XT:A:U
And after Merging it looks like this
[Forward Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 -1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U
[Forward Read Repeated]
HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U
[Reverse Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 -1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U
[Reverse Read Repeated]
HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U
So as you can see that the Forward read is repeated and identifies itself as its own pair mapped at the same position. Reverse read is also behaving in the same manner.
Has anyone come across this issue while running MergeSamFiles, if so kindly help in resolving the issue.
Thanks
I have merged some bam alignment files of different samples having read group information using picard's MergeSamFiles program.
But when I check the merged file, I see that in the merged file reads are duplicated and the pair information is corrupted.
To explain better:
Before merging, the reads information in sam file looks like this.
[Forward Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 99 chr2 152506978 60 75M = 152507022 119 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT CBCFFFFFHHHHHJJJJJJJJJJJJJJJIJJJJJIJJJJIJIJIJJIIEIIIIIIIJJIJJJJGIFIIIIJIIII X0:i:1 X1:i:0 MD:Z:62A12 RG:Z:S72 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
[Reverse Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 147 chr2 152507022 60 75M = 152506978 -119 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG HGGJJJJJJGIHGIGGJHIIIJIIGIGJJJIIJJJJIIGIIHGJIIJJJJJJJJIJJJJJIIGHHHHFEDFFCCB X0:i:1 X1:i:0 MD:Z:18A40A15 RG:Z:S72 XG:i:0 AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 XT:A:U
And after Merging it looks like this
[Forward Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 -1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U
[Forward Read Repeated]
HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U
[Reverse Read]
HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 -1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U
[Reverse Read Repeated]
HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U
So as you can see that the Forward read is repeated and identifies itself as its own pair mapped at the same position. Reverse read is also behaving in the same manner.
Has anyone come across this issue while running MergeSamFiles, if so kindly help in resolving the issue.
Thanks