Hi all,
I am very interested in Hadoop applications in NGS. The well known project showed in Hadoop community is CloudBurst. But I saw an evaluation of it from a paper "Searching SNPs with Cloud Computing". It said, "CloudBurst is capable of reporting all alignments for millions of human short reads in minutes, but does not scale well to human resequencing applications involving billions of reads. Whereas CloudBurst aligns about 1 million short reads per minute on a 24-core cluster, a typical human resequencing project generates billions of reads, requiring more than 100 days of cluster time or a much larger cluster"
. Why it claimed CloudBurst is not scalable? Crossbow in this paper adopted bowtie instead of Cloudburst for mapping short reads to the reference genome. I wanna know the reasons. In my opinion, Cloudburst is natively map-reduced, while bowtie does not, why the authors claimed such conclusion? Is there any solid comparison of these two short reads mapping tools? And if I just wanna map short reads to the reference genome, which should I take: Cloudburst or Crossbow without Soapsnp (Only the map step using Bowtie)?Thanks in advanced.
I am very interested in Hadoop applications in NGS. The well known project showed in Hadoop community is CloudBurst. But I saw an evaluation of it from a paper "Searching SNPs with Cloud Computing". It said, "CloudBurst is capable of reporting all alignments for millions of human short reads in minutes, but does not scale well to human resequencing applications involving billions of reads. Whereas CloudBurst aligns about 1 million short reads per minute on a 24-core cluster, a typical human resequencing project generates billions of reads, requiring more than 100 days of cluster time or a much larger cluster"
. Why it claimed CloudBurst is not scalable? Crossbow in this paper adopted bowtie instead of Cloudburst for mapping short reads to the reference genome. I wanna know the reasons. In my opinion, Cloudburst is natively map-reduced, while bowtie does not, why the authors claimed such conclusion? Is there any solid comparison of these two short reads mapping tools? And if I just wanna map short reads to the reference genome, which should I take: Cloudburst or Crossbow without Soapsnp (Only the map step using Bowtie)?Thanks in advanced.
Comment