Unconfigured Ad

**apfejes** · 02-04-2009, 08:08 AM

I'm probably the wrong person to attempt to answer your question, but as far as I know, we just run each lane through maq one at a time, then use mapmerge to assemble libraries back together. Thus, we often have eight maq jobs running at a time on the cluster, for each machine in operation. Again, I'm not the person who submits the jobs, so other people can probably provide more information than I can.

Sequence alignment theoretically belongs to the class of algorithms known as embarrassingly parallelizable... each sequence could theoretically be aligned by a separate computer and then recombined. The question should just be what is the optimal number of reads to align by each instance... and that I dont' know. (-:

**jperin** · 02-04-2009, 08:14 AM

Hm. The idea of separating lanes is good. I am familiar with most embarrassingly parallel methods for sequence analysis, but was hoping there might be some established methods specifically for NGS that have been developed. I am particularly interested in setting up a few processing pipelines that can be triggered (relatively automatically) and then run across our cluster system, then packaged up for post processing and results delivery.

Tools like the corona pipeline are ideal because they are pre-configured to do so off the bat. MAQ would require some initial configuration and some scripts here and there to accomplish this. I guess a generic tool for parallelizing things may be too much to ask for, but aside from splitting up lanes, or splitting up each individual alignment task, I'm wondering what else might be able to work? Bowtie has methods for splitting up across multiple cores, using the '-p' option, and I would hope that this can somehow be leveraged to cross multiple systems as well. But that's where I start to get lost, and find myself trying to figure out the code at a much lower level, which is going to take me a very long time to solve...

**Ben Langmead** · 02-04-2009, 02:39 PM

Hi jperin,

With respect to Bowtie, the -p option allows you to parallelize Bowtie in the sense of using multiple threads (which are hopefully mapped to multiple processor cores) on a single machine. For parallelizing across machines, I do not really have a pre-fab set of scripts for that. As an aside, I'm currently doing some work on getting Bowtie to work in a Cloud Computing framework, specifically using Hadoop. This would allow Bowtie to be parallelized across any cluster that has Hadoop installed, including Amazon's EC2 service. That's not ready for prime time yet, though.

Thanks,
Ben

**vruotti** · 02-04-2009, 04:21 PM

MAQ on cluster

A few comments here.
Here is a nice trick posted by Quang.

Hi Victor,
We use "maq fastq2bfq -n 1000000 ..." to split the reads.
....

Q

More here.

http://groups.google.com/group/sge-lifescience-sig/browse_thread/thread/90f3a6f6b501240c

**westerman** · 02-05-2009, 06:48 AM

Originally posted by jperin View Post

Tools like the corona pipeline are ideal because they are pre-configured to do so off the bat. MAQ would require some initial configuration and some scripts here and there to accomplish this. I guess a generic tool for parallelizing things may be too much to ask for, but aside from splitting up lanes, or splitting up each individual alignment task, I'm wondering what else might be able to work?

As far as I know the Corona pipeline does not do anything fancy. All it does is to split up the alignment task using the chromosomes with one CPU per 'chromosome' (note that a 'chromosome' could be a single contig/BAC/etc. depending on your organism). If you have single chromosome then Corona will only use one CPU.

I could be running Corona lite improperly in which case let me know! But my experience is that Corona does not employ anything more than the same-old-same-old embarrassingly parallel methods.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Parallel Processing for Sequence Analysis

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News