Header Leaderboard Ad

Collapse

Bioinformatics Computer: List your specs

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bioinformatics Computer: List your specs

    Hi All,

    In reading the forums, it seems like many people are having questions that involve computer power... How much RAM, how many processors, how long for analyses... I suspect that as NGS becomes more mainstream, there will be a lot of labs trying to build workstations to handle the work.. I myself am building a computer to do de novo alignment of a eukaryotic transcriptome using Solexa, and have toiled over its configuration.

    So rather than start another of those "how much RAM threads", it might be interesting and useful for people to describe the computer they are running analyses on. For instance, my current build includes:

    PROJECT: de novo alignment of a rodent transcriptome
    PLATFORM:Solexa 100bp paired end
    PROGRAMS USED: Velvet, AbySS

    MOTHERBOARD: TYAN S7016: Dual SocketXeon 5500 series. 18 DIMMS
    CPU: two Xeon E5520. 8 cores total
    RAM: 72gb total. (18 x 4gb sticks)

    It seems like this covers the basics, and allows for useful comparison.. This type of thread might be really useful if enough people replied.

    In addition to the workstation type configuration, it would be really interesting to see how many people are using supercomputers or large clusters to do assemblies..

  • #2
    ... it would be really interesting to see how many people are using supercomputers or large clusters to do assemblies..
    That would be me. Of course I do not have the cluster all to my self all of the time but it is handy to have it when I need it.

    Comment


    • #3
      Cluster structure?

      Originally posted by westerman View Post
      That would be me. Of course I do not have the cluster all to my self all of the time but it is handy to have it when I need it.
      Westerman, care to tell me about your cluster. How many nodes, how much ram per node? Are you running analyses in parallel, etc?

      Matt

      Comment


      • #4
        Also Dual 4 core Xeon (8 cores total). 32GB RAM, Redhat 5, Novoalign

        Comment


        • #5
          It much depends on what kind of analysis we are doing.
          We run standard Illumina pipeline on a 4 quad-core Xeon + 32 Gb RAM (HP DL580 G5). We use that also for standard ChIP-seq analysis and bwa alignments. We are going to cluster that server with the former IPAR module (which is 2 quad-core Xeon + 16 Gb which is now running FreeBSD 8 + zfs for tests)
          We run other tasks (motif discovery, statistical analysis…) on a small cluster (3 sun X4150, 64 Gb RAM 4 6-core Xeon) which is shared with other groups in the institute...

          d

          Comment


          • #6
            We run standard Illumina pipeline on
            2x Quad core Xeon 5550 32 GB ram

            Software: CLCbio Genomic Workbench.

            Configuration allow at 7 parallel de novo or reference assemblies to be finished in 10-15 min each.

            Comment


            • #7
              Originally posted by mads b View Post
              Software: CLCbio Genomic Workbench.
              I've tried a full demo but I found it very slow in importing data and analyzing them... Can you share your impressions on CLC GW? Which genomes/applications do you use it for?

              d

              Comment


              • #8
                I am in general very satisfied with the program

                I am at present analyzing bacterial genomes of 2-4 megabases sequenced as 38 bp single illumina reads (an aspergillus genome of 35 megabases is in the GA at the moment)

                i just tested time consumption on a file of 8.3 mio reads: import time 3 min 50 sec.(remember to use import function in "high throughput seq" in toolbox). Many files can be imporrted simultaneously (if you are systematic withh the process..... ;-) ).

                De novo assembly was 9 minutes creating 107 contiqs.... This does satisfy me. (but you always want it faster, of course).

                Because of the graphic interface GW might be slower???? than other programs. But as a non-bioinformatician I really get a lot of help from the graphic interface.

                Comment


                • #9
                  and by the way....I am running windows 7. Don´t know whether it makes any difference....

                  Comment


                  • #10
                    Originally posted by mads b View Post
                    i just tested time consumption on a file of 8.3 mio reads: import time 3 min 50 sec.(remember to use import function in "high throughput seq" in toolbox). Many files can be imporrted simultaneously (if you are systematic withh the process..... ;-) ).
                    De novo assembly was 9 minutes creating 107 contiqs.... This does satisfy me. (but you always want it faster, of course).
                    Because of the graphic interface GW might be slower???? than other programs. But as a non-bioinformatician I really get a lot of help from the graphic interface.
                    Mmm, I've tried it for ChIP-seq analysis for mouse samples... importing 1 lane (15 mio reads 36 bp) + aligning to reference + ChIP analysis = 6 hours + RAM draining + a crash...
                    I don't think the GUI or Windows make the difference (it's java after all).

                    d

                    Comment


                    • #11
                      Originally posted by peromhc View Post
                      Westerman, care to tell me about your cluster. How many nodes, how much ram per node? Are you running analyses in parallel, etc?

                      Matt
                      Being at a university I have access to a couple of different clusters. One has 4 boxes with 16 cores and either 32GB or 64GB -- in other words 64 cores total. We recently purchased more cores although less memory per core. My other cluster also has 64 cores with 128 GB per box. If required (and if I can go through the hoops to set it up) the university has a Condor pool with way more cores than even I could use (thousands).

                      And yes, analyses are run in parallel as much as possible. I find the major problem being handling the files individually. At the point the analysis often goes down to one CPU reading and writing to one disk.

                      Comment


                      • #12
                        Bioinformatics Computer: List your specs

                        We have been discussing different hardware specifications with various next generation sequencing companies. It does appear that there is no standard configuration but one thing is for certain, the computing demands will increase. Bottlenecks are the usual suspects, disk I/O and network backplane. Unfortunately the cost to resolve these bottlenecks are tremendous. We are trying to develop a "building block" approach that will grow with the computing demands over time.

                        I think you are correct in your basic configuration although the memory does seem to be overkill. But that may be because of your own application needs.

                        Comment


                        • #13
                          So am I correct in assuming that none of you using clusters for your analyses rely on VELVET heavily??

                          Comment


                          • #14
                            Originally posted by peromhc View Post
                            So rather than start another of those "how much RAM threads", it might be interesting and useful for people to describe the computer they are running analyses on. For instance, my current build includes:
                            Main server:

                            PROJECT: de novo assembly and alignment of a bacterial genomes, N-way comparitive SNP analysis, transcriptomes
                            PLATFORM:Illumina 36bp PE, Illumina 80bp MP, 454 FLX, 454 Titanium
                            PROGRAMS USED: Velvet, Shrimp, Nesoni
                            CPU: 2 x quad core Xeon 5482 (8 cores, 1600 FSB)
                            RAM: 64 GB total. (16 x 4gb sticks)

                            Workstations:

                            PROJECT: everything bacterial
                            PLATFORM:Illumina 36bp PE, Illumina 80bp MP, 454 FLX, 454 Titanium
                            PROGRAMS USED: CLC Genome workbench
                            CPU: 1 x quad core Intel Core2 (4 cores, 1333 FSB)
                            RAM: 16 GB total. (4 x 4gb sticks)

                            Comment


                            • #15
                              we've been using CLCGWB for a while now, v3.6.5 is fantastic. i'd agree with the comment above, i dont think you can under estimate the benefits of putting a biologist in the driving seat with software with a good gui like GWB. initially we worked with in house computing groups and command line NGS assemblers and it was very inefficient. each time there were questions or ideas for alternative assemblies/analysis there was wait time until the appropriate users were available and computing time could be found. it just wasn't competitive. clearly we gave up before pushing through the learning curve but i'd say unless you are a large institute and have full time access to a large number of dedicated well trained specialists a gui is the way to go.

                              as for specs we use single quad dell T5400s with extra hdds and 32Gb ram. lack of fast direct storage is our problem but overall works fine for us.

                              has anyone got plans to use Illumina ipars as assemblers once they come offline? I'm hoping the storage array will solve our storage problem (well for a while at least..)

                              Comment

                              Working...
                              X