Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard MergeSamFiles resulting in read duplication and loss of pair information

    Hi,

    I have merged some bam alignment files of different samples having read group information using picard's MergeSamFiles program.

    But when I check the merged file, I see that in the merged file reads are duplicated and the pair information is corrupted.

    To explain better:

    Before merging, the reads information in sam file looks like this.
    [Forward Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 99 chr2 152506978 60 75M = 152507022 119 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT CBCFFFFFHHHHHJJJJJJJJJJJJJJJIJJJJJIJJJJIJIJIJJIIEIIIIIIIJJIJJJJGIFIIIIJIIII X0:i:1 X1:i:0 MD:Z:62A12 RG:Z:S72 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
    [Reverse Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 147 chr2 152507022 60 75M = 152506978 -119 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG HGGJJJJJJGIHGIGGJHIIIJIIGIGJJJIIJJJJIIGIIHGJIIJJJJJJJJIJJJJJIIGHHHHFEDFFCCB X0:i:1 X1:i:0 MD:Z:18A40A15 RG:Z:S72 XG:i:0 AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 XT:A:U

    And after Merging it looks like this
    [Forward Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 -1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U
    [Forward Read Repeated]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U

    [Reverse Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 -1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U
    [Reverse Read Repeated]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U

    So as you can see that the Forward read is repeated and identifies itself as its own pair mapped at the same position. Reverse read is also behaving in the same manner.

    Has anyone come across this issue while running MergeSamFiles, if so kindly help in resolving the issue.

    Thanks

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:23 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-13-2024, 08:58 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Working...
X