Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • xiang
    Member
    • Mar 2009
    • 13

    Bug of Picard's Markduplicate

    I use Picard's Markduplicates. The version is 1.3. The bam files is obtained using maq2sam-long. Then I sorted it using SortSam.


    When I run
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt VALIDATION_STRINGENCY=SILENT

    I got an error as:

    INFO 2010-02-16 14:55:55 MarkDuplicates Start of doWork freeMemory: 8668240; totalMemory: 9109504; maxMemory: 1398145024
    INFO 2010-02-16 14:55:55 MarkDuplicates Reading input file and constructing read end information.
    INFO 2010-02-16 14:55:55 MarkDuplicates Will retain up to 6241718 data points before spilling to disk.
    [Tue Feb 16 14:55:55 GMT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=108986368
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:324)
    .....


    If I use
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

    I got an error as
    Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

    I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.
  • drio
    Senior Member
    • Oct 2008
    • 323

    #2
    Originally posted by xiang View Post
    I use Picard's Markduplicates. The version is 1.3. The bam files is obtained using maq2sam-long. Then I sorted it using SortSam.


    When I run
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt VALIDATION_STRINGENCY=SILENT

    I got an error as:

    INFO 2010-02-16 14:55:55 MarkDuplicates Start of doWork freeMemory: 8668240; totalMemory: 9109504; maxMemory: 1398145024
    INFO 2010-02-16 14:55:55 MarkDuplicates Reading input file and constructing read end information.
    INFO 2010-02-16 14:55:55 MarkDuplicates Will retain up to 6241718 data points before spilling to disk.
    [Tue Feb 16 14:55:55 GMT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=108986368
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:324)
    .....


    If I use
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

    I got an error as
    Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

    I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.
    Can you post a smaller representation of the BAM you are trying to use? I suggest you also send this to the picard mailing list.
    -drd

    Comment

    • drio
      Senior Member
      • Oct 2008
      • 323

      #3
      Originally posted by xiang View Post
      If I use
      java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

      I got an error as
      Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

      I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.
      Does RNAME or NRNM (check SAM spec) matches the reference genome specified on the BAM header?
      -drd

      Comment

      • xiang
        Member
        • Mar 2009
        • 13

        #4
        I created a very short bam file, with the same error when using markduplicates. It's content is as follows


        GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
        GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
        GAII02:3:1:0:1856#0 163 Chr3 13021517 97 36M * 0 189 AAGCAAATGTACCATATGGGCAAGTGAATGTACTTA @@@CCABBCABA>?B?BBBB@:@B?B==AB><BBB? RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
        GAII02:3:1:0:1856#0 83 Chr3 13021670 97 36M * 0 -189 GTAGCAATCAGCTCATCCTCTTCGTTCTTGACCATT ::::::::::8778:878688888878778688:/! RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
        GAII02:3:1:0:1184#0 163 chloroplast 87135 0 36M * 0 176 ATTATATGGATGATCCGATCCCCCAGGGCCCTGATT ?BB>B@<BC>CBBCCC@@@>################ RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:3 UQ:i:6 H0:i:0 H1:i:0
        GAII02:3:1:0:1184#0 83 chloroplast 87275 0 36M * 0 -176 ATGTTTGCTTTTCGTGAAAAAATACCAATTGAAGTT 9799997747576:::<<<<<9948699699:<;/! RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:1 UQ:i:0 H0:i:0 H1:i:2
        GAII02:3:1:0:1151#0 163 chloroplast 89820 0 36M * 0 176 ATTTTCCACAAAGTGGTGACGAAAGGTATAACTTGT BBBBCCCB@CBBB6@@=?B@@8=ABB8BB@B64??< RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:0 UQ:i:0 H0:i:2 H1:i:0
        GAII02:3:1:0:1151#0 83 chloroplast 89960 0 36M * 0 -176 AATTTTGAAAGAACGTATTGTCAAACTCTTTCAGAT 99993::<<<<<85::777;656;<7;8::9<:;/! RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:1 UQ:i:0 H0:i:0 H1:i:2
        GAII02:3:1:0:333#0 163 chloroplast 112427 59 36M * 0 146 TTTTGATGAATGCAACTTAGAAAAATTTGTTGAATA BCCCCB@=AB?BA?=@BBCBCBCCBBBBC@B:>B@? RG:Z:WTCHG MF:i:18 AM:i:29 SM:i:30 NM:i:0 UQ:i:0 H0:i:1 H1:i:1
        GAII02:3:1:0:333#0 83 chloroplast 112537 59 36M * 0 -146 TTTTGTTGCTGTCGGAAAAAGGAGAAGTCCAACTCT 78871850136315:5:;:89;;;:::9:9;996,! RG:Z:WTCHG MF:i:18 AM:i:29 SM:i:29 NM:i:1 UQ:i:0 H0:i:0 H1:i:1

        You can download the bam file directly from

        Comment

        • xiang
          Member
          • Mar 2009
          • 13

          #5
          I created a very short bam file at

          Comment

          • xiang
            Member
            • Mar 2009
            • 13

            #6
            The header is:

            @HD VN:1.0 GO:none SO:coordinate
            @SQ SN:Chr1 LN:30427671
            @SQ SN:Chr2 LN:19698289
            @SQ SN:Chr3 LN:23459830
            @SQ SN:Chr4 LN:18585056
            @SQ SN:Chr5 LN:26975502
            @SQ SN:chloroplast LN:154478
            @SQ SN:mitochondria LN:366924
            @RG ID:WTCHG PL:SLX LB:WTCHG PI:200 DS:test_Genome SM:test
            @PG ID:maq VN:0.7.1-6

            Then the reads:
            GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
            GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0

            Comment

            • drio
              Senior Member
              • Oct 2008
              • 323

              #7
              Originally posted by xiang View Post
              The header is:

              @HD VN:1.0 GO:none SO:coordinate
              @SQ SN:Chr1 LN:30427671
              @SQ SN:Chr2 LN:19698289
              @SQ SN:Chr3 LN:23459830
              @SQ SN:Chr4 LN:18585056
              @SQ SN:Chr5 LN:26975502
              @SQ SN:chloroplast LN:154478
              @SQ SN:mitochondria LN:366924
              @RG ID:WTCHG PL:SLX LB:WTCHG PI:200 DS:test_Genome SM:test
              @PG ID:maq VN:0.7.1-6

              Then the reads:
              GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
              GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
              You don't have the NRNM and MPOS properly setup for both mates:

              This works:

              Code:
              @HD     VN:1.0  GO:none SO:coordinate
              @SQ     SN:Chr1 LN:1000
              @RG     ID:WTCHG        PL:SLX  LB:WTCHG        PI:200  DS:test_Genome  SM:test
              @PG     ID:maq  VN:0.7.1-6
              GAII02:3:1:0:1074#0     99      Chr1    155     97      36M     Chr1    255     170     NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA  !19987888899:88859:;999:88777999999:    RG:Z:WTCHG      MF:i:18 AM:i:33 SM:i:33  NM:i:1       UQ:i:0  H0:i:0  H1:i:1
              GAII02:3:1:0:1074#0     147     Chr1    255     97      36M     Chr1    155     -170    TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA  BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB    RG:Z:WTCHG      MF:i:18 AM:i:33 SM:i:64  NM:i:0       UQ:i:0  H0:i:1  H1:i:0
              -drd

              Comment

              • xiang
                Member
                • Mar 2009
                • 13

                #8
                It works. Drio, thank you very much.

                Comment

                • mjdinsmore
                  Junior Member
                  • Jun 2010
                  • 2

                  #9
                  What does "...have the NRNM and MPOS properly setup for both mates" mean and how does one go about correcting the bam file so that it is setup properly for both mates?

                  Comment

                  • av_d
                    Member
                    • Sep 2009
                    • 12

                    #10
                    I have the same problem, how to fix the MRNM and MPOS information in SAM file ???

                    Comment

                    • danielr
                      Member
                      • Sep 2009
                      • 11

                      #11
                      Originally posted by av_d View Post
                      I have the same problem, how to fix the MRNM and MPOS information in SAM file ???
                      Copy the chromosome (column 3) from mate1 to column 7 (MRNM) of mate2, and position (column 4) of mate1 to column 8 (MPOS) of mate2. And vice versa (copy the chromosome (column 3) from mate2 to column 7 of mate1, and position (column 4) of mate2 to column 8 of mate1).

                      Comment

                      • xiang
                        Member
                        • Mar 2009
                        • 13

                        #12
                        samtools fixmate **

                        Comment

                        • mjdinsmore
                          Junior Member
                          • Jun 2010
                          • 2

                          #13
                          Or you can use the GATK's AddOrReplaceReadGroups :

                          Comment

                          • earonesty
                            Member
                            • Mar 2011
                            • 52

                            #14
                            Note: I get the same error on a pair of reads that don't even have alignments at all (unaligned bits are set). But setting the VALIDATION_STRINGENCY=SILENT worked for me.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              New Genomics Tools and Methods Shared at AGBT 2025
                              by seqadmin


                              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                              The Headliner
                              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                              03-03-2025, 01:39 PM
                            • seqadmin
                              Investigating the Gut Microbiome Through Diet and Spatial Biology
                              by seqadmin




                              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                              02-24-2025, 06:31 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-20-2025, 05:03 AM
                            0 responses
                            17 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-19-2025, 07:27 AM
                            0 responses
                            18 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-18-2025, 12:50 PM
                            0 responses
                            19 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-03-2025, 01:15 PM
                            0 responses
                            185 views
                            0 reactions
                            Last Post seqadmin  
                            Working...