Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cmccabe
    Senior Member
    • Jul 2012
    • 355

    VM specs for NGS bioinformatics

    I am in the process of building a VM for NGS data analysis of human DNA-seq data. The specs are below:

    Linux RedHat OS
    128GB RAM
    2 processors with 6 cores each
    500GB C drive RAID 10
    3 TB D drive

    Any suggestions?

    I am concerned about the cores and RAM, but its a start. Thank you .
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Hmm a VM with 128G of RAM?

    I assume you are referring to a real server config and a not a VM.

    Comment

    • cmccabe
      Senior Member
      • Jul 2012
      • 355

      #3
      It is stated as a stand-alone server or VM but I believe the VM is a better option if it is feasible. What would you recommend? Thank you .

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Are we referring to the same thing? To me VM = Virtual Machine.

        Comment

        • cmccabe
          Senior Member
          • Jul 2012
          • 355

          #5
          Yes you are correct, VM is Virtual Machine, thank you .

          Comment

          • blancha
            Senior Member
            • May 2013
            • 367

            #6
            It doesn't make much sense to me to run a virtual machine on your own server, among other reasons because of the overhead.

            Commercial service providers often offer virtual machines on their own servers at a lower cost than a dedicated server.
            I haven't seen a commercial company offering a virtual machine with those specifications, though.
            Even a dedicated server from a third party with those specifications might be difficult to find at a reasonable cost.

            As far as I understand from your post, you would want to buy your own server, and run programs directly on the host operating system, without using a virtual machine.

            Comment

            • cmccabe
              Senior Member
              • Jul 2012
              • 355

              #7
              The main reasons for a Virtual Machine would be more scalable, less upgrade cost, and I think easier for our hospital to implement and maintain. I have read different posts about hardware requirements and just trying to get what is best suited. Currently, I use a Ubuntu 14.04, 64GB, xeon E5-2630 8 core CPU with 1TB HDD and that doesn't seem like it's enough, but maybe I'm wrong. Thank you .

              What do you mean by overhead?
              Last edited by cmccabe; 11-17-2015, 09:36 AM.

              Comment

              • blancha
                Senior Member
                • May 2013
                • 367

                #8
                Basically, your programs will run slower on the virtual machine than if you run them directly on your host operating system.

                As far as the specifications are concerned, my only recommendation would be to go for a rack server instead of a tower server, if at all possible. Of course, that implies you have the room for a rack server, as well as the staff to manage the rack server, which is not possible in many settings. If you don't have the staff or the room for a rack server, another possibility is using a third-party computing cluster. In Canada, we're lucky to have free computing time made available to researchers through Compute Canada. Again, this may not be possible in all countries.

                I had a long discussion on this subject with a colleague who also wanted to build his own set-up. Here were my recommendations.

                1. Establish your needs first, before trying to determine the appropriate specifications for the server. For example, how many flowcells per month? For how long should the data be stored?

                2. Can you host the server with a third party? Eliminates the cost of keeping staff, and reduces the operating complexity.

                3. If you will build your own server, go for a rack server, if at all possible. This is the cheapest and most scalable option. However, it requires a room for the server, and staff. The cost of keeping competent staff may exceed the cost of operating the server.

                4. If you must go for a tower server, be aware that this option is appropriate mainly for an individual laboratory, and may not suffice to serve the needs of an entire institute.

                Comment

                • blancha
                  Senior Member
                  • May 2013
                  • 367

                  #9
                  It's a bit off-subject, but Dell produced this interesting document on building a Linux cluster for next-generation sequencing analysis.

                  It's very technical, and it's about a cluster, not a tower server, but it's still a good read for anyone wanting to build their own platform.

                  Comment

                  • blancha
                    Senior Member
                    • May 2013
                    • 367

                    #10
                    I would add that 3.5 TB doesn't seem like very much.
                    Again, it depends on your needs.
                    How many flowcells per month? For how long will the data be stored on the machine?

                    It also depends on the configuration of your server.
                    Can storage be added at a later date, if needed?

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #11
                      @cmccabe: You can probably run a VM with the specs you originally listed but that would mean you would want server hardware underneath that would be several times more powerful (unless you plan to run only one VM, which would not make sense). Beyond a certain ceiling (in terms of RAM/sockets) the cost of such hardware escalates rapidly.

                      If your IT is serious about building this right, send them this way

                      Comment

                      • cmccabe
                        Senior Member
                        • Jul 2012
                        • 355

                        #12
                        We currently run Ion Torrent sequencing on a proton. The estimate of 3.6 TB was based on that data (180 samples per year, each sample is ~20GB). Our IT department, myself included, is new to this type of data... with only a couple of years of experience. There are plans to move to a NextSeq so we have already inquired as too increasing the TB to roughly 9-12. I work in Chicago at a small 300 bed childrens hospital, but NGS is ordered a lot so I am trying to get a better idea. A VM seems like a good option but it seems like its all in the configuration. Thank you .

                        Comment

                        • colindaven
                          Senior Member
                          • Oct 2008
                          • 417

                          #13
                          I would recomment 10-20TB storage as a minimum, you'll fill it up in no time at all. You'll need to consider backup too (even external hard disks if you're on a tight budget!).

                          Comment

                          • cmccabe
                            Senior Member
                            • Jul 2012
                            • 355

                            #14
                            I am looking more into linux clusters as they seem to be a better overall fit. If designed correctly they will be a good fit for the lab today and allow for growth. Thank you .

                            Comment

                            • GenoMax
                              Senior Member
                              • Feb 2008
                              • 7142

                              #15
                              Cluster administration is a non-trivial task so make sure you have someone willing to take that on.

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                07-01-2026, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:08 AM
                              0 responses
                              6 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              53 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...