Seqanswers Leaderboard Ad

**Ben Langmead** · 08-09-2010, 04:59 AM

Originally posted by xinwu View Post

Hi all,
I am very interested in Hadoop applications in NGS. The well known project showed in Hadoop community is CloudBurst. But I saw an evaluation of it from a paper "Searching SNPs with Cloud Computing". It said, "CloudBurst is capable of reporting all alignments for millions of human short reads in minutes, but does not scale well to human resequencing applications involving billions of reads. Whereas CloudBurst aligns about 1 million short reads per minute on a 24-core cluster, a typical human resequencing project generates billions of reads, requiring more than 100 days of cluster time or a much larger cluster"
. Why it claimed CloudBurst is not scalable? Crossbow in this paper adopted bowtie instead of Cloudburst for mapping short reads to the reference genome. I wanna know the reasons. In my opinion, Cloudburst is natively map-reduced, while bowtie does not, why the authors claimed such conclusion? Is there any solid comparison of these two short reads mapping tools? And if I just wanna map short reads to the reference genome, which should I take: Cloudburst or Crossbow without Soapsnp (Only the map step using Bowtie)?Thanks in advanced.

Hi Xinwu,

To be clear, both techniques are "scalable", in the sense that they both make good use of additional CPUs when they are added. (Granted: the authors only show experiments using a few dozen up to a few hundred CPU cores.) The problem with CloudBurst is that it's slower than Bowtie on a comparable number of cores. So the authors (I'm one of them, as is Mike Schatz, the author of CloudBurst) are saying that when CloudBurst is scaled to a *dataset* the size of a human resequencing dataset, it takes longer than researchers are willing to wait. I hope that's more clear. Frankly, we should probably have said "but takes a very long time to finish for" instead of "but does not scale well to".

Ben

**xinwu** · 08-10-2010, 03:06 AM

Hi Ben,

Thanks for the clarification. CloudBurst combines hadoop and RMAP, I guess maybe RMAP is the bottleneck of the speed. Is it possible to replace RMAP with Bowtie? I mean a hadoop version of Bowtie to do the large scale short reads mapping.

**Ben Langmead** · 08-10-2010, 04:25 AM

In general, it is possible to swap different algorithms into cloud pipelines. In practice this takes some effort since programs' input and output formats might need to be changed, and you must consider whether the tool's memory footprint fits on a particular EC2 instance type, etc.

Thanks,
Ben

**xinwu** · 09-09-2010, 10:00 PM

Hi Ben,
One more question

, Bowtie is based on BWT and Cloudburst is based on seed-extended like RMAP. Is it true that seed-extended is higher sensitive and fewer limitation (say, allow gap and indel, etc) than other algorithms? If the only drawback is time consuming for seed extended method, it will be relatively easy to overcome in order to get more "accurate" or "flexible" result.

**haoyue** · 11-06-2013, 03:09 AM

how to read CloudBurst source code

Hi Mike Schatz,
Recently，I am reading cloudBurst source code,but it is too hard to read codes,because the CloudBurst has little source code comments，I wanna know the detail of implementation.would help me please?thanks!

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

CloudBurst VS Bowtie

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News