Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • System requirements linux comp. for off-machine assembly/analysis

    Dear all,

    I (we) would like to assembly 454-reads (50-100 Mb, possibly 200-400 Mb) on a computer off the actual sequencing FLX-machine. Roche proposes for this a 64-bit dual processor (dual x86 CPU) with 8 Gb RAM computer running Linux (brochure october 2008).

    1. Is this requirement still valid?
    2. Should I apply a computer with Quad-core and 8 or 16 Gb memory?
    3. Are there other people running the Roche 454-software off-machine?

    thanks,

    Richard

  • #2
    Requirements seem to have changed

    BioTeam has recently been setting up a decent sized off rig analysis cluster for a client. The 454 software comes with a script valTool.sh which will report if your rig is large enough. I was quite surprised to see that this wanted 16G on the master node, since we were following the same guidelines you mentioned. We had plenty of extra nodes, but all were configured with 8G of ram. During initial testing on one of these 8G machines it was thrashing hard. Long before base calling ever finished we found some mpi environment variables which allowed the work to run across the cluster quite quickly, but I'd be very wary of a single node analysis rig with only 8G.

    Other notable requirements include :
    • Master linux kernel >= 2.6.9-34 smp 64b
    • Disk space accessible from Master >= 1TB available
    • Compute nodes require >= 4GB RAM and same CPU/ARCH/OS specs as head/master



    [email protected]
    Last edited by cariaso; 02-12-2009, 11:44 AM. Reason: base calling, not assembly

    Comment


    • #3
      last time I did a top when using gsMapper or gsAssembler it was only using 1 core. The image analysis/base caller for titanium is mpi/multi core aware but I don't think the other tools are so the only thing that will help you is the additional memory. On the other hand we are using an 8 core 32G machine to do image/base calling ~14 hours per full plate. So you may want to take that under consideration if that is in your plans.

      Slow to post so adding:

      I think cariaso is talking about base calling. Not assembly. I think. FLX is easy either way it is just titanium that taxes everything.
      Last edited by Tom Bair; 02-12-2009, 09:20 AM. Reason: Slow to post

      Comment


      • #4
        true I did intend base calling. corrected. It seems I've been doing too many assemblies this week.

        Comment


        • #5
          runAssembly run times

          I didn't see any examples of run times for various sizes of assembly, so I thought I would post some here. Apologies if this isn't the right place.

          We're running Roche's "runAssembly" wrapper, version 2.0.00.20

          The interesting discovery that prompts this post is the "-large" flag. If you provide this flag to runAssembly, it "shortcuts some of the computationally expensive tasks" in the algorithm.

          Here are some runtimes, for single threads running on dedicated x86_64 linux machines with 8GB of RAM.

          1 data directory: 9.5M "seeds". 15 min, 9 min with LARGE flag
          2 data directories: 14M "seeds". 31 min. 21 min with LARGE flag
          3 data directories: 23M "seeds". 85 min. 21 min with LARGE flag
          4 data directories: 31M "seeds". still running. 30 min with LARGE flag.
          ...
          10 data directories: 78M "seeds". killed. 42 min with LARGE flag.

          These are sequences from a prokaryote. Your milage may vary.

          Comment


          • #6
            Originally posted by cdwan View Post
            10 data directories: 78M "seeds". killed. 42 min with LARGE flag.
            What do you mean by "killed"? Did the software fail? I've had newbler assembler fail with large amounts of data as well.

            Comment


            • #7
              De Novo Assembly into large genome 50 Mb to 100 MB, that is into insect range, beyond fungal genomes.

              It requires lots of memory. 8Mb memory machine is not enough.

              We are using 4 core, 32 MB machine, 64 bits. Our machine works for GS Assembly for fungal. But insect assembly is tough. Fungal runAssembly on this machine for 1 run only takes 1 hour or 2. But I did an insect assembly before on 35 runs of FLX, it took about 10 days to finish.

              -large flag for gs Assembly helps on speed. But still, I would prefer a beefy machine with huge memory. I would say as large memory as possible.

              Assembly is memory hog computation.

              Comment


              • #8
                Originally posted by erimar77 View Post
                What do you mean by "killed"? Did the software fail? I've had newbler assembler fail with large amounts of data as well.
                We have no idea whether it would have succeeded eventually or not. It seemed to be progressing - slowly - through the all vs. all comparison stage. We ran out of time to mess with it.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advanced Methods for the Detection of Infectious Disease
                  by seqadmin




                  The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                  ...
                  Yesterday, 01:15 PM
                • seqadmin
                  Strategies for Investigating the Microbiome
                  by seqadmin




                  Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                  11-09-2023, 07:02 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 08:12 AM
                0 responses
                15 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-22-2023, 09:29 AM
                1 response
                52 views
                0 likes
                Last Post VilliamPast  
                Started by seqadmin, 11-22-2023, 08:53 AM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-21-2023, 08:24 AM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Working...
                X