Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice for setting up a cpu cluster

    Hi,

    We've been working with NGS data on a desktop PC with AMD phenomII x6 processor, and 16GB RAM, Linux Ubuntu. This was put together rather easily, but now we are looking to create a simple cluster of nodes. We are not looking to do anything fancy, and would be more than happy to have duplicate towers with the same specs, but connected somehow. It will just be a local network.
    Our main computations at the moment is localized assembly of genes (AMOS, velvet) and alignments using various software (bwa, bowtie, blast, smalt), and we are ok to limit any particular analyses to one node.

    We would like to keep the box we've been using, but if we were to create a cluster:

    1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?

    2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?

    3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?

    We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

    Thanks in advance!

  • #2
    Originally posted by Kennels View Post
    Hi,

    We would like to keep the box we've been using, but if we were to create a cluster:

    1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?
    No special hardware is needed. You will connect the nodes/computers you buy using ethernet as your interconnect (there are other options but since you are probably on a tight budget this will be perfectly fine). Plan to purchase a good quality switch (do not buy a cheap desktop ethernet switch but get something more beefy).

    Originally posted by Kennels View Post
    2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?
    Take a look at http://www.rocksclusters.org/wordpress/. This would be the operating system/queuing software (SGE/PBS) that you will be installing on your cluster. Plan to spend some time on coming up to speed on the finer points of linux clusters if you have not done this sort of thing before.

    Originally posted by Kennels View Post
    3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?


    We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

    Thanks in advance!
    You can build heterogeneous clusters. You may want to keep things simple by using identical nodes. You will want to get some kind of network attached storage or you could build a NAS box yourself (google for hardware options, software can be this http://www.freenas.org/). Again this is a component that you would want to pay special attention to since your data (and valuable analysis) are going to reside on this storage.

    Plan to have a data backup solution of some kind. If you are going to do this as a serious business then you need to be prepared for some sort of failure (hardware/software) from which you need to be able to recover your cluster and your data.

    Finally .. before you go overboard consider overall power requirements. A cluster in a small space can start putting out significant heat so give some thought to cooling (if needed).

    Comment


    • #3
      Genomax gave you good advice. For storage, you might consider Gluster, which let's you aggregate storage space from a set of servers into a single filesystem. This might simplify your storage issues and be a cheaper solution.

      Also think about whether you can use fewer machines, each with 2 or 4 multicore processors. Aggregating your disk and memory into fewer machines gives you more resources when a job needs huge amounts of memory and can't be split across nodes.

      Comment


      • #4
        thanks for the replies, it is of great help.

        Comment


        • #5
          No additional hardware is needed. You could consider installing a Hadoop cluster - it simply involves unpacking some tarballs and setting up some config details. The good thing here is that there are already some bioinformatics frameworks (e.g. Crossbow) that can leverage an underlying Hadoop cluster.

          I am a software engineer turned product manager. Currently focusing on product & technology strategy and competitive analysis at Confluent (USA), the com...


          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          69 views
          0 likes
          Last Post seqadmin  
          Working...
          X