Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crossbow: Genotyping from short reads using cloud computing

    Hi all,

    If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.

    Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.

    As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.

    If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).

    Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.

    Thanks!
    Ben and Mike

  • #2
    This sounds great!

    Comment


    • #3
      Originally posted by Ben Langmead View Post
      Hi all,

      If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.

      Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.

      As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.

      If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).

      Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.

      Thanks!
      Ben and Mike
      Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?

      Comment


      • #4
        Originally posted by nilshomer View Post
        Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?
        Hey Nils,

        Yes, we'd love to include indel calling; we'd love to include anything else that fits! And I think there are a lot of other things (indels, SV detection) that could fit.

        And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

        Ben

        Comment


        • #5
          The bad: How do you perform indel calling without modelling indels during alignment? Without proper identification of such variants (among others) whole-genome resequencing is not performed. Also, since only bowtie is currently supported, platforms like ABI SOLiD are not supported.

          The good:
          The authors of Crossbow have done an amazing job giving a proof-of-concept of running well-known tools (bowtie and SoapSNP) on the cloud. A potential next step would be to generalize crossbow to support any aligner, variant callers, or other analysis tool. Given this type of general framework, crossbow would solve the practical computational problem of human whole-genome re-sequencing using the tools that the user deems most suitable/powerful. The onus would not be on the crossbow authors to write this support, but to enable any author of such a tool to contribute to crossbow by writing support themselves.

          How hard would it be to get other aligners, variant callers, or other tools to work in crossbow? Do they have to model the workflow of bowtie/SoapSNP?

          I envision having a workflow where you align the reads, then run many variant callers (SNP/indel, reassembly, structural variants, others...), then other analysis (assessing the potential for the SNPs to cause protein coding changes etc.), and many more processes that both branch and merge.

          Thanks for your contribution and I look forward to watching, and potentially contributing myself, to the evolution of crossbow.

          Comment


          • #6
            Hi all,

            The Crossbow paper, Searching for SNPs with cloud computing came out in provisional form today. Take a look if you're interested.

            Thanks,
            Ben

            Comment


            • #7
              bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?

              Comment


              • #8
                I'm looking forward trying Crossbow!

                It's satisfying finding other guys using and promoting cloud computing for bioinformatics analyses

                Congratulations!

                Comment


                • #9
                  Does Crossbow only analyze data from whole genome DNA sequencing?

                  Applicable to mRNA sequencing?

                  Comment


                  • #10
                    Cong! Sounds very cool!

                    Comment


                    • #11
                      Originally posted by lilithdog View Post
                      bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?
                      Do you mean you want to run bowtie?
                      First, you should cd the dir of bowtie executable file.
                      Code:
                      cd /PATH/TO/BOWTIE/DIR/
                      then, execute bowtie
                      Code:
                      ./bowtie
                      Xi Wang

                      Comment


                      • #12
                        crossbow on local cluster

                        Hi,
                        Did anybody tried crossbow on their local cluster?
                        I want to try the same.... Any insight and experience will be appreciated...

                        Thanks
                        ~Vix

                        Comment


                        • #13
                          Sample dataset for crossbow on local cluster

                          Originally posted by Ben Langmead View Post
                          And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

                          Ben
                          Hi Ben,
                          I want to try crossbow on my local hadoop enabled cluster. Can you share the data you tried for the "small version of the experiment on local Hadoop cluster". I am ending up with various errors while using other reads data.

                          With thanks,
                          Vix

                          Comment


                          • #14
                            Does Crossbow Produce Standard Bowtie Results?

                            Hey all,
                            Crossbow looks like a fantastic program. At this point I am just looking for a way to run Bowtie in parallel on an EC2 cluster. Does Crossbow have an option for just running Bowtie? If not then does Crossbow produce the Bowtie outputs that I can access? Any insight or suggestions on other software that may accomplish this would be great.
                            Dan

                            Comment


                            • #15
                              Hi all,
                              My question is same as Dan326. CloudBurst used rmap algorithm with hadoop, so Dan's question can be summarized as How can I run CloudBurst using bowtie algorithm rather than rmap. As paper of bowtie indicates, bowtie is much faster than other mapping tools, so, if it combines with hadoop, you will get the quickest solution so far. Correct me if I am wrong. I am also try to find this kind of short reads mapping solution.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X