Seqanswers Leaderboard Ad

**hpcguy** · 03-25-2011, 11:40 AM

Howdy, I'm new here, but I do parallel for a living. (hpc type)

I can't speak to some of the things you've asked, but I have installed pMap and bowtie for customers for things such as this. I'd recommend pMap for simplicity. Either way you should be able to get every core on every node working with bowtie in parallel. IO will more than likely be your limiting factor then.

Department of Biomedical Informatics | Ohio State College of Medicine

http://bmi.osu.edu/hpc/software/pmap/pmap.html

Here is some information from The Ohio State University College of Medicine I wanted to share with you.

Crossbow: Whole Genome Resequencing Analysis in the Clouds

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

pmap is MPI based, so if you have an interconnect (eth, ib, quadrics,myri, etc) and some type of MPI installed you should be good. pMap supports BWA, SOAP, Bowtie, GSNAP, MAQ and RMAP.

crossbow is Hadoop based. I can't say I've seen hadoop on rocks (not a fan of rocks myself, but it is an excellent way to start with clusters) but it is possible. I'd be REALLY surprised if no one has ever done it as there are some rather decent sized clusters out there (TACC, PNNL) using rocks. I'd search for a hadoop roll. I'd be willing to bet it's out there.

hpc

**tnabtaf** · 03-25-2011, 11:40 AM

There is some discussion of running Galaxy on ROCKS in this Galaxy-dev thread from this January.

**quantrix** · 03-25-2011, 01:16 PM

Hi hpcguy and Tnab,
Thanks for the replies. I shall look into pMap right away. It sounds like one possible solution for me to start exploring.

@hpcguy,
You say you are not a fan of Rocks. I have had to wrestle with quite a few issues in getting it upto speed due to a combination of factors. However, it is running smoothly now. I was wondering if I should not go ahead and use something like plain CentOS and install other stuff separately. What is your take on this? Do you have a favorite and why? I was also looking into Ubuntu with Kerrighed as one option. (Ubuntu enterprise maybe?)
Problem is there is not very much out there in terms of leads of how to go about clustering. If at all.

**taber13** · 09-10-2013, 01:53 PM

the following is an example of how to run bowtie on multiple nodes... will require splitting the .fastq file, then reassembling the .sam in the end.
First see how many reads you have.

"cat yourfile.fastq | echo $((`wc -l`/4))"

the result was = 14901431, so create two jobs in this case to run on two different nodes
of the rocks cluster. I created a few .sh scripts... and just keep editing them for each different job. "nano bowtie_script_1.sh"... then edit as follows:

#!/bin/bash
#
#$ -S /bin/bash
bowtie -m 1 -S -p 4 -s 0 --qupto 7450715 share/apps/bowtie-1.0.0/indexes/hg19 yourfile.fastq

second job will have different start and finish... split as many times as nodes you want to run it on.. this example uses 2 nodes.
second script: "nano bowtie_script_2.sh"... then edit as follows:
#!/bin/bash
#
#$ -S /bin/bash
bowtie -m 1 -S -p 4 -s 7450715 --qupto 14901431 share/apps/bowtie-1.0.0/indexes/hg19 yourfile.fastq

If you have bowtie installed correctly, you can then run the following:

qsub bowtie_script_1.sh
qsub bowtie_script_2.sh

this will result in two files in .SAM format

bowtie_script_1.sh.o##
bowtie_script_2.sh.o##

you would then need to join the two outputs into one .SAM file.

"cat bowtie_script_1.sh.o## <(grep -v '^@' bowtie_script_2.sh.o##) > merged_sam.sam"

Install of bowtie...

to make it available to all of your compute nodes, install it into the /export/apps/ folder, which will make it available to all of your nodes.

then edit the "/etc/skel/.bash_profile" PATH to include ":/share/apps/bowtie-1.0.0"

if you run these jobs using qsub.. if it error's out, it will create an error file in your home directory.. which will point you into the right direction.

good luck.

**gmarco** · 09-13-2013, 06:59 AM

Originally posted by hpcguy View Post

Howdy, I'm new here, but I do parallel for a living. (hpc type)

I can't speak to some of the things you've asked, but I have installed pMap and bowtie for customers for things such as this. I'd recommend pMap for simplicity. Either way you should be able to get every core on every node working with bowtie in parallel. IO will more than likely be your limiting factor then.

Department of Biomedical Informatics | Ohio State College of Medicine

http://bmi.osu.edu/hpc/software/pmap/pmap.html

Here is some information from The Ohio State University College of Medicine I wanted to share with you.

Crossbow: Whole Genome Resequencing Analysis in the Clouds

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

pmap is MPI based, so if you have an interconnect (eth, ib, quadrics,myri, etc) and some type of MPI installed you should be good. pMap supports BWA, SOAP, Bowtie, GSNAP, MAQ and RMAP.

crossbow is Hadoop based. I can't say I've seen hadoop on rocks (not a fan of rocks myself, but it is an excellent way to start with clusters) but it is possible. I'd be REALLY surprised if no one has ever done it as there are some rather decent sized clusters out there (TACC, PNNL) using rocks. I'd search for a hadoop roll. I'd be willing to bet it's out there.

hpc

I suppose pMap will work flawlessly on a Rocks cluster based on SGE right?
It supports bowtie, does it also supports bowtie2?

Thanks.

**hpcguy** · 10-08-2013, 05:36 AM

Howdy. To all the folks that have sent me Private Messages about this: please set up your mailbox such that I can reply. I cannot answer your questions without a way to reach you. thanks.

H

**hpcguy** · 10-08-2013, 05:50 AM

Rocks is fantastic when a group/person/dept is starting out. No bones about it. Fantastic. Roll it out on a single rack in 10 min if you just give it a go. Be up and running apps in 15 min (with data being available). Not much beats this. Even AWS takes more work to configure. I've personally installed it and had a 2 rack cluster up and running from turn on in under 30 minutes and was running batch jobs. But the cluster was NEVER supposed to run another application ever again.

The problem becomes as soon as there is a move into a more intermediate need/area. Rocks does not lend itself to being as flexible as needed for simplicity in advanced work. Moving to stock CentOS or Scientific Linux, RHEL, Ubuntu LTS,etc becomes a large step that can be intimidating but long term most folks that I've spoke or worked with look back and say they were glad they made the move.

I would recommend making the change to something else when you feel Rocks just is too restrictive or you need more than you can find in the normal Rolls, etc.

Originally posted by quantrix View Post

Hi hpcguy and Tnab,
Thanks for the replies. I shall look into pMap right away. It sounds like one possible solution for me to start exploring.

@hpcguy,
You say you are not a fan of Rocks. I have had to wrestle with quite a few issues in getting it upto speed due to a combination of factors. However, it is running smoothly now. I was wondering if I should not go ahead and use something like plain CentOS and install other stuff separately. What is your take on this? Do you have a favorite and why? I was also looking into Ubuntu with Kerrighed as one option. (Ubuntu enterprise maybe?)
Problem is there is not very much out there in terms of leads of how to go about clustering. If at all.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Bowtie and Clustering question.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News