Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard MergeSamFiles resulting in read duplication and loss of pair information

    Hi,

    I have merged some bam alignment files of different samples having read group information using picard's MergeSamFiles program.

    But when I check the merged file, I see that in the merged file reads are duplicated and the pair information is corrupted.

    To explain better:

    Before merging, the reads information in sam file looks like this.
    [Forward Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 99 chr2 152506978 60 75M = 152507022 119 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT CBCFFFFFHHHHHJJJJJJJJJJJJJJJIJJJJJIJJJJIJIJIJJIIEIIIIIIIJJIJJJJGIFIIIIJIIII X0:i:1 X1:i:0 MD:Z:62A12 RG:Z:S72 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 XT:A:U
    [Reverse Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 147 chr2 152507022 60 75M = 152506978 -119 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG HGGJJJJJJGIHGIGGJHIIIJIIGIGJJJIIJJJJIIGIIHGJIIJJJJJJJJIJJJJJIIGHHHHFEDFFCCB X0:i:1 X1:i:0 MD:Z:18A40A15 RG:Z:S72 XG:i:0 AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 XT:A:U

    And after Merging it looks like this
    [Forward Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 -1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U
    [Forward Read Repeated]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 67 chr2 152506978 60 75M = 152506978 1 CATTTTAGATTTTCAATAATAATTCCTAATTTCTGCTTCTCTTTGACTAAATAGCTACTCATGTAAAATATGAAT 8>=BAA=:;>BBB:@A@=A@>B@B::?>B@BB:??;?B:?:?BB??;?;BB?>:;?>;?:@@?==?BA>;=:8>9 X0:i:1 X1:i:0 BD:Z:KKQNJIUTETPCDRSOTRJSRJTPQNNMISOGQQNLOLRRNRNEESGQMICKPRMRNNTOVTSRQI==MQTPQOR MD:Z:62A12 RG:Z:S72 XG:i:0 BI:Z:FFGFVUZYRRTRGVROUORVOQTOOKOHLRLIPQOGPNQRMGNKLKODIPIRNQONKQONCJOLIDEBHGVGDHK AM:i:37 NM:i:1 SM:i:37 XM:i:1 XO:i:0 MQ:i:60 XT:A:U

    [Reverse Read]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 -1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U
    [Reverse Read Repeated]
    HWI-ARTY:52:IDC100:5:1101:10002:79055 179 chr2 152507022 60 75M = 152507022 1 GACTAAATAGCTACTCATGTAAAATATGAATTAATATATAAACTATTGGATTTAATAGAGACTGACCTCACTTTG 5:6;@A@>>;:=;:=?@?:>ABB@=??:B@B=B?=@>?=BA=9=?B@::?AA=A?=>9?:>:?;>:::A?:@AB2 X0:i:1 X1:i:0 BD:Z:FRRRJHCLROROSOUSTKTRHFQOQRSNOTKNOLNLPMRDOQLPMHRMRSDERMLLQNQGSRNGQOQSNTSCNKK MD:Z:18A40A15 RG:Z:S72 XG:i:0 BI:Z:HIIH;?I9>6HNUGNRQSRJLLOSHLKTPTQQNOJRJJOGKQMBQKJHKQIMOHJEMIMMOGHKQNLMTVTCEFF AM:i:37 NM:i:2 SM:i:37 XM:i:2 XO:i:0 MQ:i:60 XT:A:U

    So as you can see that the Forward read is repeated and identifies itself as its own pair mapped at the same position. Reverse read is also behaving in the same manner.

    Has anyone come across this issue while running MergeSamFiles, if so kindly help in resolving the issue.

    Thanks

Latest Articles

Collapse

  • seqadmin
    Quality Control Essentials for Next-Generation Sequencing Workflows
    by seqadmin




    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

    Nucleic Acid Quality Control
    Preparing for NGS starts with isolating the...
    02-10-2025, 01:58 PM
  • seqadmin
    An Introduction to the Technologies Transforming Precision Medicine
    by seqadmin


    In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
    01-27-2025, 07:46 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-07-2025, 09:30 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-05-2025, 10:34 AM
0 responses
88 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-03-2025, 09:07 AM
0 responses
70 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-31-2025, 08:31 AM
0 responses
44 views
0 likes
Last Post seqadmin  
Working...
X