Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • xinwu
    Member
    • Jul 2010
    • 33

    CloudBurst VS Bowtie

    Hi all,
    I am very interested in Hadoop applications in NGS. The well known project showed in Hadoop community is CloudBurst. But I saw an evaluation of it from a paper "Searching SNPs with Cloud Computing". It said, "CloudBurst is capable of reporting all alignments for millions of human short reads in minutes, but does not scale well to human resequencing applications involving billions of reads. Whereas CloudBurst aligns about 1 million short reads per minute on a 24-core cluster, a typical human resequencing project generates billions of reads, requiring more than 100 days of cluster time or a much larger cluster"
    . Why it claimed CloudBurst is not scalable? Crossbow in this paper adopted bowtie instead of Cloudburst for mapping short reads to the reference genome. I wanna know the reasons. In my opinion, Cloudburst is natively map-reduced, while bowtie does not, why the authors claimed such conclusion? Is there any solid comparison of these two short reads mapping tools? And if I just wanna map short reads to the reference genome, which should I take: Cloudburst or Crossbow without Soapsnp (Only the map step using Bowtie)?Thanks in advanced.
    Last edited by xinwu; 08-08-2010, 05:46 PM.
  • Ben Langmead
    Senior Member
    • Sep 2008
    • 200

    #2
    Originally posted by xinwu View Post
    Hi all,
    I am very interested in Hadoop applications in NGS. The well known project showed in Hadoop community is CloudBurst. But I saw an evaluation of it from a paper "Searching SNPs with Cloud Computing". It said, "CloudBurst is capable of reporting all alignments for millions of human short reads in minutes, but does not scale well to human resequencing applications involving billions of reads. Whereas CloudBurst aligns about 1 million short reads per minute on a 24-core cluster, a typical human resequencing project generates billions of reads, requiring more than 100 days of cluster time or a much larger cluster"
    . Why it claimed CloudBurst is not scalable? Crossbow in this paper adopted bowtie instead of Cloudburst for mapping short reads to the reference genome. I wanna know the reasons. In my opinion, Cloudburst is natively map-reduced, while bowtie does not, why the authors claimed such conclusion? Is there any solid comparison of these two short reads mapping tools? And if I just wanna map short reads to the reference genome, which should I take: Cloudburst or Crossbow without Soapsnp (Only the map step using Bowtie)?Thanks in advanced.
    Hi Xinwu,

    To be clear, both techniques are "scalable", in the sense that they both make good use of additional CPUs when they are added. (Granted: the authors only show experiments using a few dozen up to a few hundred CPU cores.) The problem with CloudBurst is that it's slower than Bowtie on a comparable number of cores. So the authors (I'm one of them, as is Mike Schatz, the author of CloudBurst) are saying that when CloudBurst is scaled to a *dataset* the size of a human resequencing dataset, it takes longer than researchers are willing to wait. I hope that's more clear. Frankly, we should probably have said "but takes a very long time to finish for" instead of "but does not scale well to".

    Ben

    Comment

    • xinwu
      Member
      • Jul 2010
      • 33

      #3
      Hi Ben,

      Thanks for the clarification. CloudBurst combines hadoop and RMAP, I guess maybe RMAP is the bottleneck of the speed. Is it possible to replace RMAP with Bowtie? I mean a hadoop version of Bowtie to do the large scale short reads mapping.

      Comment

      • Ben Langmead
        Senior Member
        • Sep 2008
        • 200

        #4
        In general, it is possible to swap different algorithms into cloud pipelines. In practice this takes some effort since programs' input and output formats might need to be changed, and you must consider whether the tool's memory footprint fits on a particular EC2 instance type, etc.

        Thanks,
        Ben

        Comment

        • xinwu
          Member
          • Jul 2010
          • 33

          #5
          Hi Ben,
          One more question , Bowtie is based on BWT and Cloudburst is based on seed-extended like RMAP. Is it true that seed-extended is higher sensitive and fewer limitation (say, allow gap and indel, etc) than other algorithms? If the only drawback is time consuming for seed extended method, it will be relatively easy to overcome in order to get more "accurate" or "flexible" result.

          Comment

          • haoyue
            Junior Member
            • Nov 2013
            • 1

            #6
            how to read CloudBurst source code

            Hi Mike Schatz,
            Recently,I am reading cloudBurst source code,but it is too hard to read codes,because the CloudBurst has little source code comments,I wanna know the detail of implementation.would help me please?thanks!

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 11:58 AM
            0 responses
            10 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            25 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            35 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            58 views
            0 reactions
            Last Post SEQadmin2  
            Working...