Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • memory requirments of velvet tool (de novo assembly)

    Hi,

    Does someone have an idea of how much memory would velvet require for a given input of short reads? And how would it possibly scale with more / longer reads?

    Also, any other 'large dataset' de novo assembly tools for the illumina reads. SOAP says 100Gb RAM for human sized genomes, are there other options and what would their memory requirements be?

    thanks for sharing..
    --
    bioinfosm

  • #2
    To answer part of it myself, this is a useful source on velvet mailing-list


    The gist is,
    Ram required for velvetg = -109635 + 18977*ReadSize + 86326*GenomeSize + 233353*NumReads - 51092*K

    Gives the answer in kb.

    Read size is in bases.
    Genome size is in millions of bases (Mb)
    Number of reads is in millions
    K is the kmer hash value used in velveth
    --
    bioinfosm

    Comment


    • #3
      The above formula derived by Simon Gladman has a caveat of only being applicable to Velvet when compiled with the default MAXKMERSIZE=31. If you compiled with 63 for example, the memory usage will increase.

      Comment


      • #4
        So what happens when the machine doesn't have enough ram?
        does it give a error or just proceed very very slowly?

        would having a large enough swap partition help?
        http://kevin-gattaca.blogspot.com/

        Comment


        • #5
          It will segfault, but sometimes it will lock up a machine so badly you will have to physically pull the plug.

          I suggest using ulimit, for example I have a 256gb machine and use
          ulimit -v 240000000
          before every run
          --
          Jeremy Leipzig
          Bioinformatics Programmer
          --
          My blog
          Twitter

          Comment


          • #6
            We've been needing approximately 30g of RAM for velvet assembly with a minimum of 24g depending on the kmer length specified. *This is with single-ended read 36bp Illumina data.
            Last edited by jgibbons1; 01-06-2010, 08:52 AM.

            Comment


            • #7
              In order to assmebly a lane of paired reads of length 75 we used 120 giga with a k-mer size of 47.
              Obviously the amount of date decrease with a smaller k-mer, but a shorter k-mer implies a higher possibility of mistakes.

              I think, this is a my opinion, that with the increasing of the read length tools like velvet will became too memory consuming, and they will became unpractical.
              With a read length of 150 an approach like PCAP, ARACNE and EDENA that build an overlap graph and not a de bruijn graph is the only feasible opportunity

              Comment


              • #8
                Is it human genome you are working on

                One approach is to map reads to reference, and assemble the unmapped reads. Though this can yield a pretty fragmented assembly that is hard to use eventually...

                Do you usually do things like remove contaminants or low quality reads, take only the unique set of reads.. ? These can certainly reduce the run time, but last I looked, using a redundant set of reads gave slightly different assembly than a non-redundant one.
                --
                bioinfosm

                Comment


                • #9
                  Sorry...I replied to the wrong thread.
                  Last edited by jgibbons1; 01-06-2010, 03:05 PM. Reason: replied to the wrong thread

                  Comment


                  • #10
                    how to regulate velevt memory cosumption

                    I m using velevet for assembly and velevtg is consuming aroung 90% of my memeory ...is there any ways where in i can control the same ... say by threading or any other step?

                    Comment


                    • #11
                      I've found that one of the best ways to reduce the memory requirements is to quality filter your read set before assembly. Low quality reads directly impact memory. Trimmomatic and Quake are both very good for quality filtering.

                      Comment


                      • #12
                        velvet memory problem

                        Hi all,

                        I have tried to use the velvet for my RNAseq data assembly.

                        My machine is about 40G RAM.

                        The read length is about 101 for my dataset. The total number reads is about 60 million for one file of pair end. The total size is about 120 million.

                        However, when I try to assembly it, and when it run after GHost threads and begin to threading through reads. I find it has occupied about 65% RAM. Hence, I need to stop it.

                        Can anyone give me some suggestions about how to reduce the memory usage? I have set the Kmer to 75 for my dataset.

                        Jingjing

                        Comment


                        • #13
                          1. Get a machine with more RAM
                          2. Use shorter k-mers
                          3. Try to reduce complexity in your reads by using Quake or something similar
                          4. Subsample your reads
                          5. Use a different assembler

                          Velvet is known to be memory-hungry, therefore 1 is the best choice. However, if this isn't an option, you should at least try 2 (75 sounds very, very high) or 3, with 4 as the last resort - unless you want to try a completely different assembler. CLC is very memory efficient but commercial...

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Quality Control Essentials for Next-Generation Sequencing Workflows
                            by seqadmin




                            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                            Nucleic Acid Quality Control
                            Preparing for NGS starts with isolating the...
                            02-10-2025, 01:58 PM
                          • seqadmin
                            An Introduction to the Technologies Transforming Precision Medicine
                            by seqadmin


                            In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
                            01-27-2025, 07:46 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 02-07-2025, 09:30 AM
                          0 responses
                          65 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-05-2025, 10:34 AM
                          0 responses
                          101 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-03-2025, 09:07 AM
                          0 responses
                          81 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 01-31-2025, 08:31 AM
                          0 responses
                          45 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X