Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    Bioinformatics Computer: List your specs

    Hi All,

    In reading the forums, it seems like many people are having questions that involve computer power... How much RAM, how many processors, how long for analyses... I suspect that as NGS becomes more mainstream, there will be a lot of labs trying to build workstations to handle the work.. I myself am building a computer to do de novo alignment of a eukaryotic transcriptome using Solexa, and have toiled over its configuration.

    So rather than start another of those "how much RAM threads", it might be interesting and useful for people to describe the computer they are running analyses on. For instance, my current build includes:

    PROJECT: de novo alignment of a rodent transcriptome
    PLATFORM:Solexa 100bp paired end
    PROGRAMS USED: Velvet, AbySS

    MOTHERBOARD: TYAN S7016: Dual SocketXeon 5500 series. 18 DIMMS
    CPU: two Xeon E5520. 8 cores total
    RAM: 72gb total. (18 x 4gb sticks)

    It seems like this covers the basics, and allows for useful comparison.. This type of thread might be really useful if enough people replied.

    In addition to the workstation type configuration, it would be really interesting to see how many people are using supercomputers or large clusters to do assemblies..
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    ... it would be really interesting to see how many people are using supercomputers or large clusters to do assemblies..
    That would be me. Of course I do not have the cluster all to my self all of the time but it is handy to have it when I need it.

    Comment

    • peromhc
      Senior Member
      • Sep 2009
      • 108

      #3
      Cluster structure?

      Originally posted by westerman View Post
      That would be me. Of course I do not have the cluster all to my self all of the time but it is handy to have it when I need it.
      Westerman, care to tell me about your cluster. How many nodes, how much ram per node? Are you running analyses in parallel, etc?

      Matt

      Comment

      • What_Da_Seq
        Member
        • Jul 2008
        • 28

        #4
        Also Dual 4 core Xeon (8 cores total). 32GB RAM, Redhat 5, Novoalign

        Comment

        • dawe
          Senior Member
          • Apr 2009
          • 258

          #5
          It much depends on what kind of analysis we are doing.
          We run standard Illumina pipeline on a 4 quad-core Xeon + 32 Gb RAM (HP DL580 G5). We use that also for standard ChIP-seq analysis and bwa alignments. We are going to cluster that server with the former IPAR module (which is 2 quad-core Xeon + 16 Gb which is now running FreeBSD 8 + zfs for tests)
          We run other tasks (motif discovery, statistical analysis…) on a small cluster (3 sun X4150, 64 Gb RAM 4 6-core Xeon) which is shared with other groups in the institute...

          d

          Comment

          • mads b
            Junior Member
            • Aug 2009
            • 4

            #6
            We run standard Illumina pipeline on
            2x Quad core Xeon 5550 32 GB ram

            Software: CLCbio Genomic Workbench.

            Configuration allow at 7 parallel de novo or reference assemblies to be finished in 10-15 min each.

            Comment

            • dawe
              Senior Member
              • Apr 2009
              • 258

              #7
              Originally posted by mads b View Post
              Software: CLCbio Genomic Workbench.
              I've tried a full demo but I found it very slow in importing data and analyzing them... Can you share your impressions on CLC GW? Which genomes/applications do you use it for?

              d

              Comment

              • mads b
                Junior Member
                • Aug 2009
                • 4

                #8
                I am in general very satisfied with the program

                I am at present analyzing bacterial genomes of 2-4 megabases sequenced as 38 bp single illumina reads (an aspergillus genome of 35 megabases is in the GA at the moment)

                i just tested time consumption on a file of 8.3 mio reads: import time 3 min 50 sec.(remember to use import function in "high throughput seq" in toolbox). Many files can be imporrted simultaneously (if you are systematic withh the process..... ;-) ).

                De novo assembly was 9 minutes creating 107 contiqs.... This does satisfy me. (but you always want it faster, of course).

                Because of the graphic interface GW might be slower???? than other programs. But as a non-bioinformatician I really get a lot of help from the graphic interface.

                Comment

                • mads b
                  Junior Member
                  • Aug 2009
                  • 4

                  #9
                  and by the way....I am running windows 7. Don´t know whether it makes any difference....

                  Comment

                  • dawe
                    Senior Member
                    • Apr 2009
                    • 258

                    #10
                    Originally posted by mads b View Post
                    i just tested time consumption on a file of 8.3 mio reads: import time 3 min 50 sec.(remember to use import function in "high throughput seq" in toolbox). Many files can be imporrted simultaneously (if you are systematic withh the process..... ;-) ).
                    De novo assembly was 9 minutes creating 107 contiqs.... This does satisfy me. (but you always want it faster, of course).
                    Because of the graphic interface GW might be slower???? than other programs. But as a non-bioinformatician I really get a lot of help from the graphic interface.
                    Mmm, I've tried it for ChIP-seq analysis for mouse samples... importing 1 lane (15 mio reads 36 bp) + aligning to reference + ChIP analysis = 6 hours + RAM draining + a crash...
                    I don't think the GUI or Windows make the difference (it's java after all).

                    d

                    Comment

                    • westerman
                      Rick Westerman
                      • Jun 2008
                      • 1104

                      #11
                      Originally posted by peromhc View Post
                      Westerman, care to tell me about your cluster. How many nodes, how much ram per node? Are you running analyses in parallel, etc?

                      Matt
                      Being at a university I have access to a couple of different clusters. One has 4 boxes with 16 cores and either 32GB or 64GB -- in other words 64 cores total. We recently purchased more cores although less memory per core. My other cluster also has 64 cores with 128 GB per box. If required (and if I can go through the hoops to set it up) the university has a Condor pool with way more cores than even I could use (thousands).

                      And yes, analyses are run in parallel as much as possible. I find the major problem being handling the files individually. At the point the analysis often goes down to one CPU reading and writing to one disk.

                      Comment

                      • pssclabs
                        Junior Member
                        • Sep 2009
                        • 6

                        #12
                        Bioinformatics Computer: List your specs

                        We have been discussing different hardware specifications with various next generation sequencing companies. It does appear that there is no standard configuration but one thing is for certain, the computing demands will increase. Bottlenecks are the usual suspects, disk I/O and network backplane. Unfortunately the cost to resolve these bottlenecks are tremendous. We are trying to develop a "building block" approach that will grow with the computing demands over time.

                        I think you are correct in your basic configuration although the memory does seem to be overkill. But that may be because of your own application needs.

                        Comment

                        • peromhc
                          Senior Member
                          • Sep 2009
                          • 108

                          #13
                          So am I correct in assuming that none of you using clusters for your analyses rely on VELVET heavily??

                          Comment

                          • Torst
                            Senior Member
                            • Apr 2008
                            • 275

                            #14
                            Originally posted by peromhc View Post
                            So rather than start another of those "how much RAM threads", it might be interesting and useful for people to describe the computer they are running analyses on. For instance, my current build includes:
                            Main server:

                            PROJECT: de novo assembly and alignment of a bacterial genomes, N-way comparitive SNP analysis, transcriptomes
                            PLATFORM:Illumina 36bp PE, Illumina 80bp MP, 454 FLX, 454 Titanium
                            PROGRAMS USED: Velvet, Shrimp, Nesoni
                            CPU: 2 x quad core Xeon 5482 (8 cores, 1600 FSB)
                            RAM: 64 GB total. (16 x 4gb sticks)

                            Workstations:

                            PROJECT: everything bacterial
                            PLATFORM:Illumina 36bp PE, Illumina 80bp MP, 454 FLX, 454 Titanium
                            PROGRAMS USED: CLC Genome workbench
                            CPU: 1 x quad core Intel Core2 (4 cores, 1333 FSB)
                            RAM: 16 GB total. (4 x 4gb sticks)

                            Comment

                            • The_Roads
                              Member
                              • May 2009
                              • 38

                              #15
                              we've been using CLCGWB for a while now, v3.6.5 is fantastic. i'd agree with the comment above, i dont think you can under estimate the benefits of putting a biologist in the driving seat with software with a good gui like GWB. initially we worked with in house computing groups and command line NGS assemblers and it was very inefficient. each time there were questions or ideas for alternative assemblies/analysis there was wait time until the appropriate users were available and computing time could be found. it just wasn't competitive. clearly we gave up before pushing through the learning curve but i'd say unless you are a large institute and have full time access to a large number of dedicated well trained specialists a gui is the way to go.

                              as for specs we use single quad dell T5400s with extra hdds and 32Gb ram. lack of fast direct storage is our problem but overall works fine for us.

                              has anyone got plans to use Illumina ipars as assemblers once they come offline? I'm hoping the storage array will solve our storage problem (well for a while at least..)

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...