Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice for setting up a cpu cluster

    Hi,

    We've been working with NGS data on a desktop PC with AMD phenomII x6 processor, and 16GB RAM, Linux Ubuntu. This was put together rather easily, but now we are looking to create a simple cluster of nodes. We are not looking to do anything fancy, and would be more than happy to have duplicate towers with the same specs, but connected somehow. It will just be a local network.
    Our main computations at the moment is localized assembly of genes (AMOS, velvet) and alignments using various software (bwa, bowtie, blast, smalt), and we are ok to limit any particular analyses to one node.

    We would like to keep the box we've been using, but if we were to create a cluster:

    1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?

    2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?

    3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?

    We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

    Thanks in advance!

  • #2
    Originally posted by Kennels View Post
    Hi,

    We would like to keep the box we've been using, but if we were to create a cluster:

    1. Do we have to buy some kind of special hardware of clusters and setup from scratch? Or just build identical boxes and connect them somehow?
    No special hardware is needed. You will connect the nodes/computers you buy using ethernet as your interconnect (there are other options but since you are probably on a tight budget this will be perfectly fine). Plan to purchase a good quality switch (do not buy a cheap desktop ethernet switch but get something more beefy).

    Originally posted by Kennels View Post
    2. What sort of software should we use to connect the nodes? Given alot of the NGS software still don't support MPI, should we consider MPI, or just some kind of LAN/switch connection between the nodes/towers?
    Take a look at http://www.rocksclusters.org/wordpress/. This would be the operating system/queuing software (SGE/PBS) that you will be installing on your cluster. Plan to spend some time on coming up to speed on the finer points of linux clusters if you have not done this sort of thing before.

    Originally posted by Kennels View Post
    3. Can the extra nodes be of the different architecture (No. of processors, motherboard, amount of RAM etc) as the master node if we consider MPI?


    We've started to do some research, but if someone experienced could give some quick advice that would help us greatly!

    Thanks in advance!
    You can build heterogeneous clusters. You may want to keep things simple by using identical nodes. You will want to get some kind of network attached storage or you could build a NAS box yourself (google for hardware options, software can be this http://www.freenas.org/). Again this is a component that you would want to pay special attention to since your data (and valuable analysis) are going to reside on this storage.

    Plan to have a data backup solution of some kind. If you are going to do this as a serious business then you need to be prepared for some sort of failure (hardware/software) from which you need to be able to recover your cluster and your data.

    Finally .. before you go overboard consider overall power requirements. A cluster in a small space can start putting out significant heat so give some thought to cooling (if needed).

    Comment


    • #3
      Genomax gave you good advice. For storage, you might consider Gluster, which let's you aggregate storage space from a set of servers into a single filesystem. This might simplify your storage issues and be a cheaper solution.

      Also think about whether you can use fewer machines, each with 2 or 4 multicore processors. Aggregating your disk and memory into fewer machines gives you more resources when a job needs huge amounts of memory and can't be split across nodes.

      Comment


      • #4
        thanks for the replies, it is of great help.

        Comment


        • #5
          No additional hardware is needed. You could consider installing a Hadoop cluster - it simply involves unpacking some tarballs and setting up some config details. The good thing here is that there are already some bioinformatics frameworks (e.g. Crossbow) that can leverage an underlying Hadoop cluster.

          I am a software engineer turned product manager. Currently focusing on product & technology strategy and competitive analysis at Confluent (USA), the com...


          Comment

          Latest Articles

          Collapse

          • seqadmin
            Advanced Tools Transforming the Field of Cytogenomics
            by seqadmin


            At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
            09-26-2023, 06:26 AM
          • seqadmin
            How RNA-Seq is Transforming Cancer Studies
            by seqadmin



            Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
            09-07-2023, 11:15 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 07:14 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-29-2023, 09:38 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-27-2023, 06:57 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-26-2023, 07:53 AM
          0 responses
          31 views
          0 likes
          Last Post seqadmin  
          Working...
          X