Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zxyeo
    Junior Member
    • Jun 2011
    • 6

    Planning computing budget for Exome-seq data analysis

    Our lab is planning a project which aims to analyze up to 100 human biopsy exome-sequencing data for the next couple of years. I hope I could get some feedbacks here.

    Considering the data are likely come with higher coverage, we are thinking of upgrading our current setup. We are also interested in parallelized the primary analytical pipeline in order to save more time for downstream analysis (variants filtering, statistical testing using R).

    1. Is it sensible to look for a desktop workstation with 2x6 cores, 96G RAM, 2x1.5TB 7200RPM which at least serve us well for the next few years?

    2. Is ~USD8000 enough if we decided to purchase it in mid- 2012?

    Thanks!
  • dsenalik
    Carrot Scientist
    • Nov 2009
    • 42

    #2
    Not so long ago I put together a system with 240GBytes ram, 32 cores (4x8) for just over $9000. 2x1TB RAID for OS drive, 3x3TBytes for data plus two backups.
    Based on the TYAN S8812 http://www.tyan.com/product_SKU_spec...&SKU=600000186 and I love it

    Comment

    • Jon_Keats
      Senior Member
      • Mar 2010
      • 279

      #3
      That will be fine, I'd vote to drop to 48GB ram and add more storage in a RAID configuration and increase the number of CPUs so you can multi-task. Most NGS tools do not leverage huge amounts of RAM but are very I/O and compute intensive.

      Comment

      • Bukowski
        Senior Member
        • Jan 2010
        • 388

        #4
        I agree with Jon, for that number of cores 48GB of RAM should be sufficient (it is for us). We have 3 4x4core machines w/48GB RAM and can comfortably push 48 exomes a week through our pipeline (we don't tend to push more than one sample per core).
        Last edited by Bukowski; 11-22-2011, 03:46 AM.

        Comment

        • westerman
          Rick Westerman
          • Jun 2008
          • 1104

          #5
          I'll disagree, a bit, with Jon who said
          Most NGS tools do not leverage huge amounts of RAM but are very I/O and compute intensive.
          I'll agree with the former (I/O intensive) but disagree with the latter (CPU intensive). While many tools do offer parallel multi-cpu capability some do not and even those which are parallel will often have portions of their code/pipeline which become single-CPU. Note that Bukowski uses "one sample per core" or, if I am reading his comment correctly, single-CPU programs (albeit on multiple samples at a time.)

          Personally I much prefer high-memory machines over high-core machines. One way of looking at this is that while a program will take extra time to complete when it runs into CPU limits once a program runs into a memory limit then it will never complete. I don't want to have the latter situation. On the other hand I do a lot of denovo work and those programs tend to be memory intensive. So go with with the human exome people say.

          I do think that your disk space is rather wimpy. 2x1.5TB 7200RPM. Let's assume no RAIDing and thus you get, at the best, 3000GB or about 30GB per sample. Seems small especially since that 3TB is not really 3TB after disk overhead and even smaller if you go with a fast-RAID system. On the other hand you can always easily buy more disks.

          Getting back to the second part of your original question, as per 'dsenalik' your USD$8,000 budget should do just fine. Maybe just plan on spending that and seeing what you can purchase in the middle of 2012.

          Comment

          • biznatch
            Senior Member
            • Nov 2010
            • 124

            #6
            Hard drive prices have gone up recently because of shortages due to flooding in Thailand halting production, so if you don't need all that storage space right away maybe add on more as you need later, although some reports are saying prices will stay elevated for 6 months to a year. Maybe you can still find some that haven't gone up yet, if you can, get them now.

            Eg. http://news.cnet.com/8301-13924_3-57...?tag=mncol;txt or just google it.

            Comment

            • Bukowski
              Senior Member
              • Jan 2010
              • 388

              #7
              Originally posted by westerman View Post
              I'll disagree, a bit, with Jon who said I'll agree with the former (I/O intensive) but disagree with the latter (CPU intensive). While many tools do offer parallel multi-cpu capability some do not and even those which are parallel will often have portions of their code/pipeline which become single-CPU. Note that Bukowski uses "one sample per core" or, if I am reading his comment correctly, single-CPU programs (albeit on multiple samples at a time.)
              I will clarify a little! The work is I/O intensive. We've had best success optimising our pipeline by increasing I/O performance. We parallelise where possible, so if systems are not saturated (sample per core), processes are threaded to fill core capacity where possible. Either by taking advantage of built in threading, or splitting jobs more naively across more cores. I gave the sample/core example to give an idea of the turnaround we can achieve with the setup.

              I would not recommend our setup for assembly either, having done some all I have ever wanted in that situation is 'moar RAM'.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              45 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Working...