Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • hpcguy
    replied
    Rocks is fantastic when a group/person/dept is starting out. No bones about it. Fantastic. Roll it out on a single rack in 10 min if you just give it a go. Be up and running apps in 15 min (with data being available). Not much beats this. Even AWS takes more work to configure. I've personally installed it and had a 2 rack cluster up and running from turn on in under 30 minutes and was running batch jobs. But the cluster was NEVER supposed to run another application ever again.

    The problem becomes as soon as there is a move into a more intermediate need/area. Rocks does not lend itself to being as flexible as needed for simplicity in advanced work. Moving to stock CentOS or Scientific Linux, RHEL, Ubuntu LTS,etc becomes a large step that can be intimidating but long term most folks that I've spoke or worked with look back and say they were glad they made the move.

    I would recommend making the change to something else when you feel Rocks just is too restrictive or you need more than you can find in the normal Rolls, etc.

    Originally posted by quantrix View Post
    Hi hpcguy and Tnab,
    Thanks for the replies. I shall look into pMap right away. It sounds like one possible solution for me to start exploring.

    @hpcguy,
    You say you are not a fan of Rocks. I have had to wrestle with quite a few issues in getting it upto speed due to a combination of factors. However, it is running smoothly now. I was wondering if I should not go ahead and use something like plain CentOS and install other stuff separately. What is your take on this? Do you have a favorite and why? I was also looking into Ubuntu with Kerrighed as one option. (Ubuntu enterprise maybe?)
    Problem is there is not very much out there in terms of leads of how to go about clustering. If at all.

    Leave a comment:


  • hpcguy
    replied
    Howdy. To all the folks that have sent me Private Messages about this: please set up your mailbox such that I can reply. I cannot answer your questions without a way to reach you. thanks.

    H

    Leave a comment:


  • gmarco
    replied
    Originally posted by hpcguy View Post
    Howdy, I'm new here, but I do parallel for a living. (hpc type)

    I can't speak to some of the things you've asked, but I have installed pMap and bowtie for customers for things such as this. I'd recommend pMap for simplicity. Either way you should be able to get every core on every node working with bowtie in parallel. IO will more than likely be your limiting factor then.

    Here is some information from The Ohio State University College of Medicine I wanted to share with you.




    pmap is MPI based, so if you have an interconnect (eth, ib, quadrics,myri, etc) and some type of MPI installed you should be good. pMap supports BWA, SOAP, Bowtie, GSNAP, MAQ and RMAP.

    crossbow is Hadoop based. I can't say I've seen hadoop on rocks (not a fan of rocks myself, but it is an excellent way to start with clusters) but it is possible. I'd be REALLY surprised if no one has ever done it as there are some rather decent sized clusters out there (TACC, PNNL) using rocks. I'd search for a hadoop roll. I'd be willing to bet it's out there.

    hpc
    I suppose pMap will work flawlessly on a Rocks cluster based on SGE right?
    It supports bowtie, does it also supports bowtie2?

    Thanks.

    Leave a comment:


  • taber13
    replied
    the following is an example of how to run bowtie on multiple nodes... will require splitting the .fastq file, then reassembling the .sam in the end.
    First see how many reads you have.

    "cat yourfile.fastq | echo $((`wc -l`/4))"

    the result was = 14901431, so create two jobs in this case to run on two different nodes
    of the rocks cluster. I created a few .sh scripts... and just keep editing them for each different job. "nano bowtie_script_1.sh"... then edit as follows:

    #!/bin/bash
    #
    #$ -S /bin/bash
    bowtie -m 1 -S -p 4 -s 0 --qupto 7450715 share/apps/bowtie-1.0.0/indexes/hg19 yourfile.fastq

    second job will have different start and finish... split as many times as nodes you want to run it on.. this example uses 2 nodes.
    second script: "nano bowtie_script_2.sh"... then edit as follows:
    #!/bin/bash
    #
    #$ -S /bin/bash
    bowtie -m 1 -S -p 4 -s 7450715 --qupto 14901431 share/apps/bowtie-1.0.0/indexes/hg19 yourfile.fastq

    If you have bowtie installed correctly, you can then run the following:

    qsub bowtie_script_1.sh
    qsub bowtie_script_2.sh

    this will result in two files in .SAM format

    bowtie_script_1.sh.o##
    bowtie_script_2.sh.o##

    you would then need to join the two outputs into one .SAM file.

    "cat bowtie_script_1.sh.o## <(grep -v '^@' bowtie_script_2.sh.o##) > merged_sam.sam"

    Install of bowtie...

    to make it available to all of your compute nodes, install it into the /export/apps/ folder, which will make it available to all of your nodes.

    then edit the "/etc/skel/.bash_profile" PATH to include ":/share/apps/bowtie-1.0.0"

    if you run these jobs using qsub.. if it error's out, it will create an error file in your home directory.. which will point you into the right direction.

    good luck.

    Leave a comment:


  • quantrix
    replied
    Hi hpcguy and Tnab,
    Thanks for the replies. I shall look into pMap right away. It sounds like one possible solution for me to start exploring.

    @hpcguy,
    You say you are not a fan of Rocks. I have had to wrestle with quite a few issues in getting it upto speed due to a combination of factors. However, it is running smoothly now. I was wondering if I should not go ahead and use something like plain CentOS and install other stuff separately. What is your take on this? Do you have a favorite and why? I was also looking into Ubuntu with Kerrighed as one option. (Ubuntu enterprise maybe?)
    Problem is there is not very much out there in terms of leads of how to go about clustering. If at all.

    Leave a comment:


  • tnabtaf
    replied
    There is some discussion of running Galaxy on ROCKS in this Galaxy-dev thread from this January.

    Leave a comment:


  • hpcguy
    replied
    Howdy, I'm new here, but I do parallel for a living. (hpc type)

    I can't speak to some of the things you've asked, but I have installed pMap and bowtie for customers for things such as this. I'd recommend pMap for simplicity. Either way you should be able to get every core on every node working with bowtie in parallel. IO will more than likely be your limiting factor then.

    Here is some information from The Ohio State University College of Medicine I wanted to share with you.




    pmap is MPI based, so if you have an interconnect (eth, ib, quadrics,myri, etc) and some type of MPI installed you should be good. pMap supports BWA, SOAP, Bowtie, GSNAP, MAQ and RMAP.

    crossbow is Hadoop based. I can't say I've seen hadoop on rocks (not a fan of rocks myself, but it is an excellent way to start with clusters) but it is possible. I'd be REALLY surprised if no one has ever done it as there are some rather decent sized clusters out there (TACC, PNNL) using rocks. I'd search for a hadoop roll. I'd be willing to bet it's out there.

    hpc

    Leave a comment:


  • quantrix
    started a topic Bowtie and Clustering question.

    Bowtie and Clustering question.

    Hi Group,
    I am a relative newbie tying to come upto speed. So I managed to assemble a 20 core cluster and am just beginning to figure out how to work the bioinformatics assembly algorithms. So my scenario is this

    1) I currently have a WES raw data file measuring 5 GB. I have a quality score file which is approximately 12 GB.

    2) I have a four node AMD cluster with 32 GB RAM. I installed and configured Rocks software on the same.

    3) I have been looking into Bowtie to do the analysis on this cluster.

    Some questions which come to my mind are as follows

    1) How and where do I start?

    2) Is it possible to install bowtie on the ROCKS cluster such that I can use the 4 nodes to run the analysis in parallel?

    3) For this single massive file of 5 GB raw reads, how do I go about doing the assembly?

    4) With bowtie, am I restricted to using only ONE node on which to run the analysis on?

    5) OR, can I split my raw reads of file X4 and farm out each file to each one of the nodes and do the assembly and then do a final assembly of all the four assembled files?

    6) Has anyone installed Galaxy tools on a ROCKS cluster? Could you share your experiences of the same?

    I realize these are very basic and fundamental questions. But I would highly appreciate an answer. Hopefully I will be able to answer these questions on the forum in the near future.
    Regards
    Quantrix

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:47 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X