Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard tools out of memory: PermGen

    Hi all, first post. Great site!

    Thought I'd share a new problem... I'm just starting with Picard tools (version 1.56) to estimate redundancy, and have predictably been wrestling with memory issues...

    But not with the heap, as I'd expected (and not initially noticed). Instead, I'm running out of PermGen space. One of my .bam's is really large, but it happens even on much smaller .bam's containing single ends of mate pairs.

    I increased it to 1g (-XX:PermSize=1g -XX:MaxPermSize=1g), and it still died, though after 2 hrs CPU time rather than 10 minutes as before. I've increased it now to 4g and we'll see how it goes.

    Does this point to memory leak issues within Picard tools, that the permanent heap gets this full?? Seems to be way beyond where JVM expects things to be, and I've rarely seen PermGen space problems mentioned, never for Picard tools.

    Cheers,

    Doug


    [Mon Nov 21 19:11:40 CET 2011] net.sf.picard.sam.MarkDuplicates INPUT=map.CLCh001.lib300.bam_sorted.bam OUTPUT=map.CLCh001.lib300.bam_sorted.bam.PicardDups.bam METRICS_FILE=map.CLCh001.lib300.bam_sorted.bam.MarkDuplicates REMOVE_DUPLICATES=true ASSUME_SORTED=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=80000 TMP_DIR=[tmp] MAX_RECORDS_IN_RAM=10000000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Mon Nov 21 19:11:40 CET 2011] Executing as douglas.scofield@xxxxxxx on Linux 2.6.32-131.17.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_20-b20
    INFO 2011-11-21 19:11:40 MarkDuplicates Start of doWork freeMemory: 132124215176; totalMemory: 132857659392; maxMemory: 132857659392
    INFO 2011-11-21 19:11:40 MarkDuplicates Reading input file and constructing read end information.
    INFO 2011-11-21 19:11:40 MarkDuplicates Will retain up to 527212934 data points before spilling to disk.
    [Mon Nov 21 21:44:56 CET 2011] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 153.26 minutes.
    Runtime.totalMemory()=132857659392
    Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
    at java.lang.String.intern(Native Method)
    at net.sf.samtools.SAMSequenceRecord.<init>(SAMSequenceRecord.java:83)
    at net.sf.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:205)
    at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:96)
    at net.sf.samtools.BAMFileReader.readHeader(BAMFileReader.java:391)
    at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:144)
    at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:114)
    at net.sf.samtools.SAMFileReader.init(SAMFileReader.java:514)
    at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:167)
    at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:122)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:267)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)

  • #2
    Hi Doug,

    I just had the same problem and solved it with -XX:MaxPermSize=512m

    As you already tried with 1g, it looks like you just need to increase it further... was the 4g enough?

    Comment


    • #3
      Hi, yep, 4GB was enough. If I recall it died with 2GB. The main challenge was getting enough heap space, had to request 256GB and if I believe the htop stats it was using 221GB at one point :-)

      /Doug

      Comment


      • #4
        The following are the output of picards_markduplicates, I changed some of the options to bigger number but it still give me error.
        my file is about 10GB of bam file, and program was running with 24G RAM using version 1.49 and 1.50. Please help me to fix the problem. Thank you so much

        net.sf.picard.sam.MarkDuplicates INPUT=accepted_hits_sorted.bam OUTPUT=accepted_hits_sorted.pk.mk.out METRICS_FILE=accepted_hits_sorted.pk.mk.metrics ASSUME_SORTED=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=500000000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 MAX_RECORDS_IN_RAM=500000000 REMOVE_DUPLICATES=false SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 TMP_DIR=/tmp/tangwei VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
        [Fri Feb 03 16:06:12 EST 2012] Executing as tangwei@p809 on Linux 2.6.18-128.el5 i386; Java HotSpot(TM) Server VM 1.7.0_02-b13
        INFO 2012-02-03 16:06:12 MarkDuplicates Start of doWork freeMemory: 63278136; totalMemory: 64356352; maxMemory: 1908932608
        INFO 2012-02-03 16:06:12 MarkDuplicates Reading input file and constructing read end information.
        INFO 2012-02-03 16:06:12 MarkDuplicates Will retain up to 7575129 data points before spilling to disk.
        INFO 2012-02-03 16:06:18 MarkDuplicates Read 1000000 records. Tracking 8778 as yet unmatched pairs. 8778 records in RAM. Last sequence index: 0
        ......
        ......
        INFO 2012-02-03 16:41:35 MarkDuplicates Read 151000000 records. Tracking 5300425 as yet unmatched pairs. 5300425 records in RAM. Last sequence index: 51
        [Fri Feb 03 16:52:03 EST 2012] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 45.84 minutes.
        Runtime.totalMemory()=1980170240
        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.regex.Matcher.<init>(Matcher.java:224)
        at java.util.regex.Pattern.matcher(Pattern.java:1088)
        at net.sf.picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:61)
        at net.sf.picard.sam.MarkDuplicates.buildReadEnds(MarkDuplicates.java:364)
        at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:298)
        at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
        at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)


        Originally posted by dgscofield View Post
        Hi, yep, 4GB was enough. If I recall it died with 2GB. The main challenge was getting enough heap space, had to request 256GB and if I believe the htop stats it was using 221GB at one point :-)

        /Doug

        Comment


        • #5
          Perhaps you need to tell Java to use your memory (Java heap space), if I remember correctly Java allocates only 1Gb of memory if you don't instruct it differently.
          You should use the option -Xmx

          Have a look, for example, at http://www.ehow.com/how_5347474_set-...eap-space.html

          Originally posted by townway View Post
          The following are the output of picards_markduplicates, I changed some of the options to bigger number but it still give me error.
          my file is about 10GB of bam file, and program was running with 24G RAM using version 1.49 and 1.50. Please help me to fix the problem. Thank you so much

          [cut]

          [Fri Feb 03 16:52:03 EST 2012] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 45.84 minutes.
          Runtime.totalMemory()=1980170240
          Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
          at java.util.regex.Matcher.<init>(Matcher.java:224)
          at java.util.regex.Pattern.matcher(Pattern.java:1088)
          at net.sf.picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:61)
          at net.sf.picard.sam.MarkDuplicates.buildReadEnds(MarkDuplicates.java:364)
          at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:298)
          at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
          at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
          at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Advanced Tools Transforming the Field of Cytogenomics
            by seqadmin


            At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
            Yesterday, 06:26 AM
          • seqadmin
            How RNA-Seq is Transforming Cancer Studies
            by seqadmin



            Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
            09-07-2023, 11:15 PM
          • seqadmin
            Methods for Investigating the Transcriptome
            by seqadmin




            Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

            Whole Transcriptome RNA-seq
            Whole transcriptome sequencing...
            08-31-2023, 11:07 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:57 AM
          0 responses
          6 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 07:53 AM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-25-2023, 07:42 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-22-2023, 09:05 AM
          0 responses
          44 views
          0 likes
          Last Post seqadmin  
          Working...
          X