Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • issue changing predetermined K values on SOAPdenovo2

    I am currently using SOAPDENOVO2 on a supercomputer with SLURM queuing system to perform a de novo assembly from FASTQ paired-end files with genomic DNA reads.

    When I use SOAPdenovo-63mer or SOAPdenovo-127mer, I don't have any problem and I do assemblies with k=63 and k=127 in little more than 16 hours for each assembly, executed in a node with 64 threads and 240Gb of memory, sending to the queue system the following script.sh:

    #!/bin/sh
    #SBATCH --nodes=1
    #SBATCH --ntasks=64
    #SBATCH --mem=240000
    #SBATCH --time=3-00:00:00
    #SBATCH -e error_log.txt
    #SBATCH -o output_log.txt module load soapdenovo2
    SOAPdenovo-63mer all -s config_file.txt -o assemblies/k63_ -R -p 64 SOAPdenovo-127mer all -s config_file.txt -o assemblies/k127_ -R -p 64
    The troubles start when I try to choose another k value than the predetermined k=63 and k=127 using the -K parameter; for example, if I try to perform an assembly with k=89 through this command:

    SOAPdenovo-127mer all -s config_file.txt -K89 -o assemblies/k89_ -R -p 64
    the execution fails, and when I check the error_log I get this line:

    slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2323585.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
    So, I guess this is a memory issue... but this is not happening with the predefined values of k=63 and k=127... Why does SOAPdenovo2 increase the memory requirements when I use other k values, and how can I overcome this issue?

  • #2
    ampsevilla it is likely that the out-of-memory (OOM) error is due to SOAPdenovo2 requiring more memory to assemble the genome with a larger k-mer size.

    When you choose a k-mer size of 89, the memory requirements for the assembly process are increased, which is causing the OOM error. This is because increasing the k-mer size also increases the complexity of the assembly, which requires more memory to store the assembly graph and related data structures.

    To overcome this issue, you can try increasing the amount of memory allocated to the job in the SLURM script. You can also try reducing the number of threads used in the assembly process. This may help reduce the memory requirements for the assembly and avoid the OOM error.

    Additionally, you can try reducing the size of the input data by filtering out low-quality reads or using a subset of the data for the assembly. This may also help reduce the memory requirements for the assembly process.

    Finally, you can consider using a different de novo assembly tool that is better suited for larger k-mer sizes and has lower memory requirements. Some popular alternatives to SOAPdenovo2 include SPAdes, ABySS, and IDBA-UD.

    Comment


    • #3
      GenomicSeq first of all, I sincerely appreciate your quick response.

      Originally posted by GenomicSeq View Post
      ampsevilla it is likely that the out-of-memory (OOM) error is due to SOAPdenovo2 requiring more memory to assemble the genome with a larger k-mer size.

      When you choose a k-mer size of 89, the memory requirements for the assembly process are increased, which is causing the OOM error. This is because increasing the k-mer size also increases the complexity of the assembly, which requires more memory to store the assembly graph and related data structures.
      I don't understand why this is happening, because with the predetermined k=127, SOAPdenovo2 works perfectly, and K=89 is much smaller than it.

      Definitively, I'll try to reduce the number of threads as you say, maybe it will helps. Unfortunately, I can't reduce the size of input data because they are already filtered, the problem is that the genome we want to assemble is very large and complex.

      We are also trying another tools like SPAdes and ABySS, but we had some troubles with them too. We'll try IDBA-UD, thank you so much for the advice!

      Comment


      • #4
        ampsevilla that is odd...

        Now I'm wondering if it's something else. Let me know what you find and I've you're able to fix it!

        Comment


        • #5
          GenomicSeq I've tried to reduce the number of threads and use only the pregraph mode instead of all mode, and I gave it 247Gb for memory and 3 days for time limit, but I got still the same error message:
          Some of your processes may have been killed by the cgroup out-of-memory handler.
          I'm stuck with this issue.

          Comment


          • #6
            ampsevilla sorry, I wish I had some more advice to give. I'm a little lost. I'll try and ask some friends that are more savvy with this kind of work and get back to you once I hear their opinions.

            Comment


            • #7
              GenomicSeq Finally it worked: a problem due to recent cluster configuration changes was limiting the amount of available memory below the specified limits. Thank you so much for your assitance!😄

              Comment


              • #8
                ampsevilla that's great! So what exactly did you have to change? I wish I could have been more help on this.

                Comment


                • #9
                  Originally posted by GenomicSeq View Post
                  ampsevilla that's great! So what exactly did you have to change? I wish I could have been more help on this.
                  Same code, the problem was related to header: #SBATCH --mem 240000 should give me 240G of RAM, but for some reasons related to cluster reconfiguring, the memory limit for all jobs was temporary adjusted up to 10GB, and I was driving me crazy.

                  Anyway, you have been helpful and I really appreciate it. Thank you so much!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X