Header Leaderboard Ad

Collapse

Server hardware and OS

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Server hardware and OS

    Hello

    I would like to pose a question to computer scientists and other researchers using NGS about the server hardware, OS and software for NGS analysis.

    In a few months I will be working with RNA-seq data from an Illumina GAII, with the data I am going to align the reads with the reference BAC libraries and ESTs available, annotate candidate genes and find SNPs for further plant breeding experiments. I am also going to generate a database so that all the data generated from the RNA-seq and my analysis can be used for future use.

    My supervisor has asked me to handle purchasing, setting up and maintaining the server for this project. The IT director likes dell servers and has mentioned getting another dell for this project. But I just cringe at the thought of running Linus on a dell.
    I personally want a Sun Fire x64 server with Solaris OS mainly because of the ZFS.

    Considering the RNA-seq analysis software and storage/backup of 100s of Gb -- Which sever hardware and OS system would work best?

    I want to thank you in advance for your opinion.
    jdjax
    Ph.d. Student
    Åarhus University

  • #2
    Yeah, we all want a decent file system, with checksums.
    I haven't had much luck with OpenSolaris on third party hardware though that won't be problem with a machine from SUN. Keep in mind that you'll need a service agreement.

    RAID, in any variation is no replacement for backups though.

    Don't underestimate the difficulty of getting analysis software written for Linux running on Solaris - it can be done, but I would not imagine it as trivial.
    Potentially, you can get a virtual Linux running on it (via XEN), but don't skimp on the RAM ZFS supposedly likes a big cache and you have to statically assign the RAM between the systems.

    Comment


    • #3
      I would get Linux servers for computation. As finkernagel implied, getting analysis software running on Solaris can be either a pain or impossible. We have both Solaris and Linux (centos/redhat) machines and there is just some software we can not run on Solaris. The reverse is only true in a few edge cases.

      The file system is a different matter and, arguably, should be decoupled from the compute machines. For the file system you will want to consider both fast access for computing and longer term slow storage (as well as archival media). We use both a BlueArc system (expensive at $18K for 7.5 TB but it handles everything we throw at it) and Sun "thumpers" (aka, 4500s; 48TB using ZFS, and relatively cheap).


      And personally I recommend that unless you are running a really big set of machines, I would let someone else (your IT guys or the "cloud") handle the hardware & base OS software. Instead you should concentrate on the science and analysis of the NGS data. There are enough headaches in that.

      Comment


      • #4
        Originally posted by jdjax View Post
        My supervisor has asked me to handle purchasing, setting up and maintaining the server for this project. The IT director likes dell servers and has mentioned getting another dell for this project. But I just cringe at the thought of running Linus on a dell.
        I personally want a Sun Fire x64 server with Solaris OS mainly because of the ZFS.

        Considering the RNA-seq analysis software and storage/backup of 100s of Gb -- Which sever hardware and OS system would work best?

        I want to thank you in advance for your opinion.
        I have direct experience with nexenta CP 3 + ZFS. The question is: why would you need ZFS? If you need it for dedup, well, consider that most of illumina data won't be duplicated, only some of your analysis will be. If you need it for compression, consider that most of the data will be in compressed format. If you need it for FS healing, well, that is great, but you a RAID 6 may be enough. Indeed I have Nexenta + ZFS for a Galaxy instance, where there's an high chance of having duplicated big text files. ZFS is not the most performant FS (unless you are able to tune it very well). Also, I had to spend some additional time on building 64bit apps for NGS.
        We have an HP server + a MSA disk array (RAID 6) + Ubuntu 10.10 and it works great!

        d

        Comment


        • #5
          Thanks for your input. I appreciate you help.
          jdjax
          Ph.d. Student
          Åarhus University

          Comment


          • #6
            My 2 cents,

            I seem you have some experience with unix. I'd suggest you take care of the project. Yes, it is going to be more work for you but you will have full root access to the hardware. If there is any problem, you can blame yourself.

            It is a pity that most of the scientific hardware this days is only tested in windows, I mean, linux.

            Here is another alternative to the single machine approach:

            Hardware

            + 24U rack (1u) -- consider a 42u if you are planning to expand
            + 1 APC (2u)
            + 4 8 cores 32G 1u machines. 2 x 1T 10k rpm drives SATA-II (16G would be enough); People seems to like HP, any other suggestions?

            + 1G network switch (1u) -- What do you guys use for this?
            + 2 cores 8G 1u machine -- (storage server)
            + 20Tb external disk storage (check suggestions in the same thread)
            + power strip (1)
            + something else?

            Software

            + Your favorite linux distro on the machines
            + GApipeline
            + NFS
            + SAMBA (storage server)
            + ...

            The GAII windows box dumps data via samba in the external storage.
            You could compute the GERALD in the external storage or move the necessary
            bits (lane by lane) to the local 1T disks (if you want to speed things up).
            Disable ELAND and get only the GApipeline stats. Compute the alignments with BWA.

            Once you have this up and running, explore installing and setting up a job scheduler:
            SGE seems the favorite one out there (PBS is good too).
            -drd

            Comment


            • #7
              All respect to Drio, however I want to reiterate one of his sentences while changing it to reemphasize my point:

              Originally posted by drio View Post
              I seem you have some experience with unix. I'd suggest you take care of the project. Yes, it is going to be more work for you but you will have full root access to the hardware. If there is any problem, you can blame yourself.
              Taking his last sentence, I will re-write it a bit: "If there is any problem, you can spend hours fixing it by yourself".

              There is no doubt that there is a lot of fun and a lot of control in building and running your own systems. And you can learn a lot of computer-geekiness that will help out in the future. But if you have competent IT staff that listen to your needs and are responsive to fixing problems, then by all means let them do the work and handle the headaches. Concentrate on what you are good at -- NGS analysis.

              There are so many computer-related tasks that we no longer do ourselves -- e.g., running cat-5 or fiber wiring throughout our building; the network routing to get our machines out to the internet; mail and web servers -- why should setting up a compute and disk cluster be any different? Why should we who focus on bioinformatics do the grunt work? Instead let the seriously knowledgeable people spend the time to make everything run smoothly. If you want the experience of networking, building machines, and setting up servers then doing this at home is a low cost and low pressure way of learning -- no one will be yelling at you in the middle of the night to get the cluster back up and running.

              On the other hand if your supervisor is saying "let's set up our own cluster because we can do it better than they can and, by the way, here is the $$$$ to build it properly", well then, just take the money and have fun. Just don't expect to get much sleep.


              Going back to Drio's specifications. I'd recommend two 8-core but 64+ GB machines over four lower memory machines. A program can always run longer if there is not enough compute power but rarely can it be made to run at all if there is not enough memory.

              The 48U cabinet seems like overkill to me. I'd stick with 24U although, as usual, if money is not object then why not 48U? I'd just spend the $$$ elsewhere. Drio's recommendation is has about 12Us taken up. If you do start expansion then you can always buy another rack.

              A proper server will have OOB (out-of-band) capability. For this you should hook the OOB to a separate switch in order to be redundant. Doesn't need to be 1G.

              Carefully spec out the UPS (what Drio is calling the APC -- brand name there) so that it will cover your power requirements. Give yourself lots of extra capability.

              HP, Dell,IBM. Doesn't matter much if you get their server lines. Equally important is to get up-front and paid for a 3 or better yet 5-year service contract. That way you will not have to worry about failures. Plan on a 5-year time to obsolescent for your cluster.

              Heating, cooling, power outlet and noise. While a 24U or 48U will easily fit in a lab, the heat given off may surprise you. Putting this into a back corner of the lab may not work. Be prepared. Check to make sure that you have an adequate and dedicated power outlet(s). Multiple smaller racks can help out on the heating/cooling issue since, in theory, you can spread the racks around the lab.

              Comment


              • #8
                Good discussion.

                I see your point, and it makes sense. But, since this was a single GAII machine and jdjax seems
                to have some unix and systems skills this would be a great opportunity for expanding his knowledge while keeping full control of the different elements of the pipeline (informatics only of course).

                Sorry about the APC, that was the brand of the last UPC I used.

                Can you elaborate more on your network switches? For what he is doing typical 1G would be
                fine but I'd like to know what people uses when there are more sequencers.
                -drd

                Comment


                • #9
                  Originally posted by drio View Post
                  Can you elaborate more on your network switches? For what he is doing typical 1G would be
                  fine but I'd like to know what people uses when there are more sequencers.
                  On the switches? I haven't paid much attention. They are pretty generic. The 454, which does not have a built-in cluster, is plugged into a linksys 1Gb which then goes into the wall. The SOLiD (in a different room), which does have the vendor-supplied cluster, has some sort of switch for the cluster. Probably the cluster switch is a Dell-rebranded one since the cluster is made of Dell computers. I just plugged the cable into the switch without looking hard at it.


                  The "wall" (Purdue provided networking) is probably Cisco intra- and inter-building.

                  Our compute cluster -- which we built some time ago from bits and pieces -- has a 48 port TrendNet TEG-448WS 1Gb switch for the main traffic and a D-Link DGS-1016D for the OOB traffic.

                  Our purchased-by-us-but-run-by-Purdue-It shared cluster has 10Gb switches. Useful for flinging around the data from a SOLiD run especially to a 'bluearc' storage system. 10 Gb is less useful for the 454 runs. At one time I knew the brand name of the 10Gb switch (Purdue advertised it as the "first academic 10GB cluster" and gave out T-shirts with the vendors on it to those of us who helped build the cluster however my shirt gave up the ghost a while back.)


                  A 1 GB switch will be good enough for jdjax's project. I doubt if 10Gb would be useful. I also think that almost any brand-name switch will work.

                  Comment


                  • #10
                    Thank you for all your input. There is a cold server room in the basement where the server will be placed. Money is not too much of a problem we have about $60,000 to spend on this, including service packages.

                    The IT guys only have experience with Windows OS, so they will not be able to help me with the Linux/Unix OS. Also the majority of the network and other servers are dells so I am not sure how helpful the IT guys will be. IT did mention that he can take care of all the hardware but the software I will have to take care of myself.

                    I have basic knowledge of Solaris and I have worked with Ubuntu on my desktop. I am assuming that Ubuntu sever OS is will be similar. So to save me from future headaches I can just go with Ubuntu.

                    Most of my experience is with software, the hardware is the one portion of the system I am lacking. Thank you for all your suggestions, I have been spending a lot of time 'Googleing' all the terms listed so I have a better idea what you are referring to.

                    If you have any more suggestions or comments, please post them. Thanks again.
                    jdjax
                    Ph.d. Student
                    Åarhus University

                    Comment


                    • #11
                      UMM..... I would appreciate your help again.

                      The IT guy wants to have Red Hat Linux or SUSE Linux on the server, BUT from looking over this section of the forum there seems to be a lot of problems with these OS.

                      What are your thoughts and concerns!?!

                      Thanks.

                      The IT guy and his server specialist that he likes mentioned these specs for the server:

                      HP DL585G7 6176SE 4P 64GB ICE
                      *4 - 12 Core AMD CPU
                      * 2.3 GHz pr. Core
                      * 104 GB Memory
                      * 12 MB Cache L3
                      * 2 stk. 146 GB SAS Drives 15K
                      * P410 Smart Array 1 GB FBWC
                      * 4 NIC port 1 Gbit

                      Thanks again.
                      Last edited by jdjax; 12-14-2010, 05:26 AM. Reason: adding more information
                      jdjax
                      Ph.d. Student
                      Åarhus University

                      Comment


                      • #12
                        Looks good... and expensive?

                        The drives are pretty small. Not that it will be a problem for what you are trying to do right now but having a local scratching are can be very useful.

                        On the other hand, with 104G, you will have 52G of ramdisk (/dev/shm). That's very useful too.
                        -drd

                        Comment


                        • #13
                          Just wanted some comments on the RAID array created using Solid State Disks.

                          I got around 547 MegaBytes/s consistent sequential read throughputs with two Crucial C300 SSDs in RAID0

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                            by seqadmin


                            ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                            01-24-2023, 01:19 PM
                          • seqadmin
                            Introduction to Single-Cell Sequencing
                            by seqadmin
                            Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                            The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                            ...
                            01-09-2023, 03:10 PM

                          ad_right_rmr

                          Collapse
                          Working...
                          X