Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • velvet and its hunger for memory

    Hi,

    I am currently trying to assemble a amoeba genome and I have about 300 million Solexa reads (paired end, 200 bp insert size).

    When using velvet:

    velveth . 31 -fasta -shortPaired input.fa

    velvetg . -ins_length 200 -exp_cov 50 -cov_cutoff 5 -max_coverage
    300 -min_pair_count 20

    I get a error message like this:

    ---
    velvetg: Can't calloc 18446744072010747658 InsertionMarkers totalling
    18446744046528688288 bytes: Cannot allocate memory
    Reading roadmap file ./Roadmaps
    301083362 roadmaps reads
    Creating insertion markers
    ---

    So what to do with that message ? Do I really need that much memory ? 18000 Petabyte ? It has to be a bug ...

    Thanks for any comments,

    andpet

  • #2
    I am not sure if that is a bug but one approach I would take is to do a "saturation test". Start with 30M reads, run velvet, see what your assembly looks like, and then add 30M more, see if it is improving as far as assembly size and N50.

    You might find at some point that adding additional reads does not help the assembly.

    Oh and let me know if you see a 20 petabyte machine on the dell website.
    --
    Jeremy Leipzig
    Bioinformatics Programmer
    --
    My blog
    Twitter

    Comment


    • #3
      I've seen this kind of error with a smaller data set, too ... though 300M reads is a lot. I've run out of memory on a 512G machine with a 200M read data set, roughly half paired end.

      So, I'm guessing that while you won't really need petabytes of RAM (that's probably a bug that you should report on the velvet-help list), you may not be able to assemble that data set unless you've got access to a terabyte(s) machine ...

      Comment


      • #4
        Thanks for the idea with the saturation test, I will give it a try.

        Maybe its just because of the low kmer size (31). But yesterday I read somewhere that the latest velvet version supports higher kmer sizes, so I will use higher kmers.

        Anyway, I also tried abyss and it worked ...

        Comment


        • #5
          Did you ever figure out how much memory VELVET took for the 300million reads?

          Comment


          • #6
            No, as I remember, velvet used 128 GB RAM plus additional 100 GB swap (virtual memory) and crashed ...

            I think the problem is simply that with increasing read numbers (and thereby increasing absolute error numbers) you are more likely to get all possible 31mers and therefore the graph gets really big ...

            Comment


            • #7
              You could try NextGENe for your assemblies

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X