Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sebl
    Member
    • Mar 2014
    • 26

    Feedback on workstation for bioinformatics

    Dear all,

    Following a discussion on a good workstation for bioinformatics work...

    We are working on bacterial/plasmids/viruses genomes. Things we are doing at small scale today and that we are going to do more (i.e. scale up to hundreds of bacterial genomes) are mapping, de novo assembly, genome alignments, pan/core genome analyses, comparison by SNP and pairwise blast etc.

    I approached our IT dept with basic specs for a workstation based on messages I found in the forum and in the respective forums of the programs we work with.

    Below is what a vendor suggested to the IT team and now I was asked to check if that looks good enough. Since my knowledge on hardware is poor, I would be glad to get some feedback here.

    I already asked that the OS is changed to Linux... I first thought on a dual-boot system, but as I read and think about it, I think that this should be mainly a Linux (Bio-Linux?) system, with a Windows VM if needed. I cannot think on any bioinformatic program that will work on Windows but not on Linux... we are still mostly working within Windows (with Linux VM), but I guess that we should make a final change over Linux now.

    Thanks in advance!

    Base Unit HP Z840 Workstation
    Packaging HP Single Unit Packaging
    Chassis HP Z840 1125W (1450W/200V) 90% Eff Chass
    Operating System 10*Pro 64 downgrade to Win7 Pro 64 *
    Add-On Selection Operating System Load to PCIe
    Recovery Media Windows 7 Pro 64-bit OS DVD+DRDVD
    Processor Intel Xeon E5-2630v3 2.4 1866 8C 1stCPU
    Processor 2 Intel Xeon E5-2630v3 2.4 1866 8C 2ndCPU
    System Memory 128GB DDR4-2133 (16x8GB) 2CPU RegRAM
    Graphics Card NVIDIA NVS 310 1GB 1st GFX
    Internal Storage 01 HP Z Turbo Drive 256GB PCIe 1st SSD
    Internal Storage 01 4TB 7200 RPM SATA 1st HDD
    Internal Storage 02 4TB 7200 RPM SATA 2nd HDD
    Internal Storage 03 4TB 7200 RPM SATA 3rd HDD
    Internal Storage 04 4TB 7200 RPM SATA 4th HDD
    Internal Storage 05 4TB 7200 RPM SATA 5th HDD
    Optical Device 1 9.5mm Slim SuperMulti DVDRW 1st ODD
    Media Card Reader HP 15-In-1 Media Card Reader
    Warranty HP 3/3/3 Warranty
    Country Kit HP Z840 Country Kit
    Add-On Selection HP Dual Processor Air Cooling Kit
  • cmccabe
    Senior Member
    • Jul 2012
    • 355

    #2
    I use an HP Z640 for analysis of human ngs data. Though it was not designed optimaly, it is well suited for our current needs.
    That being said a linux OS is a good choice, the flavor (ubuntu, centos, redhat) depends on your comort level and preference. We do run a windows only application, nextgene, but setup a VM rather than a dual-boot as the dual-boot was rather difficult with windows.
    The type of workstation that you need really depends on your data and the applications used (are they memory dependent or processor intensive). I to am getting ideas from others more experienced, but your off to a good start.

    Comment

    • Jessica_L
      Senior Member
      • Feb 2010
      • 117

      #3
      My only word of caution regarding your linux OS is to choose it carefully.

      I currently use Ubuntu in a VM and I've had issues getting certain programs to compile correctly (i.e. CASAVA, bcl2fastq from Illumina). Unfortunately, my IT then built our linux workstation with Ubuntu so I'm having to revisit a lot of the same problems when I go to install software. On the plus side, they're usually problems for which I've already identified solutions, but it can get frustrating.

      I have a second VM that uses RedHat and I haven't had any problems or issues with it. Others may have a more informed experience with that OS, though.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Is 128 GB the max RAM for this model? I almost wonder if you should drop the second CPU and get more RAM, if you are going to be doing a lot of de novo assemblies ( I am assuming the configuration has been maxed out for your budget).

        Comment

        • blancha
          Senior Member
          • May 2013
          • 367

          #5
          I like CentOS for the operating system.
          Definitely not Windows, under any condition.
          Should fire your IT team for proposing Windows.
          RedHat Enterprise Linux is basically the same thing as CentOS.
          CentOS is the community version of RedHat Linux.

          The DVD drive and the media reader are not necessary, but I suppose the cost of having them is minimal relative to the cost of the system.

          I don't see the utility of the professional graphics card for next-generation sequencing, but then again if you have the budget it won't do any harm. The money spent on the graphics card could be spent on doubling the RAM.

          Comment

          • sebl
            Member
            • Mar 2014
            • 26

            #6
            The money spent on the graphics card could be spent on doubling the RAM.
            I should really check on that.

            Is 128 GB the max RAM for this model? I almost wonder if you should drop the second CPU and get more RAM
            Eh, a colleague suggested that I ask for further processors... The max RAM seems to be 512. But it is already considered a very unusual purchase in our institute, so I did not want to push that much in the specs. If I get it right, there is room to upgrade it later if necessary.

            About the budget, I actually just gave the IT people a basic configuration, like about 128 RAM and 16 cores, based on suggestions I've seen in the forum, without getting into too many other details. This machine is what the vendor suggested.

            I thought about Bio-Linux as the OS...

            Comment

            • blancha
              Senior Member
              • May 2013
              • 367

              #7
              I have 48 cores on my institute's server, and it is constantly overloaded.
              Luckily, I have access to thousands of core on an external computing cluster.
              I do work nearly exclusively with eukaryotic NGS data, though.

              You can certainly use all 16 cores, once you discover the joys of parallel processing.

              It's just a question of how patient you are, and what turnaround you want. The more cores, the more samples you can process in parallel, and the faster you can process individual samples when parallelization in possible.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                @sebl appears to belong to a lab (not a core?) and even though prediction of hundreds of samples sounds interesting it may be a while before the lab starts doing that many (reagent costs add up quickly, if you are really going to be running hundreds of samples, even bacterial). If there really are hundreds of samples then using a central compute facility becomes economical/effective.

                @blancha: You can't be the only user on your local server if it has 48 cores and it still stays busy. If you are the only user, then you must be analyzing hundreds of samples a week to keep all those cores busy

                Comment

                • sebl
                  Member
                  • Mar 2014
                  • 26

                  #9
                  @GenoMax: Indeed.

                  Also, once we set up a pipeline for analysis, if it will take one day more to get it done it does not really matter most times, as long as the computer is able to process it in the end.

                  I agree that for really really large sets we may need some bioinformatics core etc. But we are not there yet

                  Comment

                  • blancha
                    Senior Member
                    • May 2013
                    • 367

                    #10
                    @GenoMax, I currently have 16 human exosome RNA-Seq samples to reprocess. I'm taking 4 cores per sample for the TopHat runs.
                    4*16 = 64 cores
                    I'm already exhausted my 48 cores. A TopHat run with one core would just be far too long.
                    And, yes, there is a proteomics web application running on the same server, so I have to be careful not to overload the server completely. I actually just keep 38 cores for my NGS pipelines, and leave the 10 others free for other uses.
                    I also have another project with 6 samples to reprocess that has currently been sitting in the queue on the computing cluster that I also use for the past 2 days, either because the cluster is overloaded or because the scheduler is malfunctioning again.

                    It doesn't take hundreds of samples to use 48 cores.
                    Granted, I should probably switch from TopHat to a faster aligner, but it's the only program in my pipeline that I have always been able to count on for giving reliable results. The researchers also still insist on using Cuffdiff, despite my best efforts to convince them to switch to featureCounts and DESeq2.

                    None of this is really relevant to @sebl since he has already said that turnaround is not an issue. But, one can really ever have too many cores. There is often a linear correlation between the number of cores available and the runtime for most bioinformatics programs.
                    Last edited by blancha; 12-08-2015, 01:00 PM.

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      @blancha: Sounds to me like your processes are I/O bound (not surprising) or memory limited. How much RAM is available per core? As you said our discussion is not relevant to @sebl's question though.

                      Comment

                      • blancha
                        Senior Member
                        • May 2013
                        • 367

                        #12
                        Our local server, at our institute, has 580 GB of shared memory.
                        So, RAM is generally not an issue.

                        On the Compute Canada cluster, each core requested comes with 2.7 GB RAM, which is generally sufficient.

                        Yes, there is a lot of I/O.

                        I should probably switch to a more efficient pipeline.
                        I should use STAR or Brian's BBMAP, but TopHat has just been my workhorse for years.
                        I can't wean the researchers off Cuffdiff, mainly because they always want the isoform data, which they end up discarding anyway.

                        Even without TopHat or Cuffdiff, some steps monopolize a processor. For example, I had to run bedtools genomecov on dozens of samples last week. I took 42 processors at the same time, which then paralyzed the proteomics web interface running on the same server. I had to reset the queue settings to use only 38 cores.

                        Anyway, I'm sorry to have hijacked @sebl thread, but there can just never be too many cores, either to process multiple samples together, or process one sample in parallel threads.

                        Comment

                        • sebl
                          Member
                          • Mar 2014
                          • 26

                          #13
                          No problem. You keep the thread active so I may get more replies from people

                          What about Biolinux as OS? Any cons that I should be aware of?

                          Thanks again.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            Originally posted by sebl View Post
                            No problem. You keep the thread active so I may get more replies from people

                            What about Biolinux as OS? Any cons that I should be aware of?

                            Thanks again.
                            Stick with a standard OS (centOS, ubuntu etc) and install apps as necessary to keep things flexible. Leave the systems administration to someone who's job description reflects that

                            Comment

                            • GenoMax
                              Senior Member
                              • Feb 2008
                              • 7142

                              #15
                              Originally posted by blancha View Post
                              but there can just never be too many cores, either to process multiple samples together, or process one sample in parallel threads.
                              I am not 100% convinced about that but I am more patient and do have access to significant resources.

                              It sounds like you have a quad-socket server which would be on the end of not affordable for @sebl. I generally have found BBMap best for my needs and working mostly with a cluster there is no point in having more cores assigned to a job than there are in a physical server since the scheduler (and in turn the admins) don't like it.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...