Header Leaderboard Ad

Collapse

Crossbow: Genotyping from short reads using cloud computing

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crossbow: Genotyping from short reads using cloud computing

    Hi all,

    If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.

    Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.

    As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.

    If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).

    Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.

    Thanks!
    Ben and Mike

  • #2
    This sounds great!

    Comment


    • #3
      Originally posted by Ben Langmead View Post
      Hi all,

      If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.

      Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.

      As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.

      If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).

      Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.

      Thanks!
      Ben and Mike
      Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?

      Comment


      • #4
        Originally posted by nilshomer View Post
        Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?
        Hey Nils,

        Yes, we'd love to include indel calling; we'd love to include anything else that fits! And I think there are a lot of other things (indels, SV detection) that could fit.

        And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

        Ben

        Comment


        • #5
          The bad: How do you perform indel calling without modelling indels during alignment? Without proper identification of such variants (among others) whole-genome resequencing is not performed. Also, since only bowtie is currently supported, platforms like ABI SOLiD are not supported.

          The good:
          The authors of Crossbow have done an amazing job giving a proof-of-concept of running well-known tools (bowtie and SoapSNP) on the cloud. A potential next step would be to generalize crossbow to support any aligner, variant callers, or other analysis tool. Given this type of general framework, crossbow would solve the practical computational problem of human whole-genome re-sequencing using the tools that the user deems most suitable/powerful. The onus would not be on the crossbow authors to write this support, but to enable any author of such a tool to contribute to crossbow by writing support themselves.

          How hard would it be to get other aligners, variant callers, or other tools to work in crossbow? Do they have to model the workflow of bowtie/SoapSNP?

          I envision having a workflow where you align the reads, then run many variant callers (SNP/indel, reassembly, structural variants, others...), then other analysis (assessing the potential for the SNPs to cause protein coding changes etc.), and many more processes that both branch and merge.

          Thanks for your contribution and I look forward to watching, and potentially contributing myself, to the evolution of crossbow.

          Comment


          • #6
            Hi all,

            The Crossbow paper, Searching for SNPs with cloud computing came out in provisional form today. Take a look if you're interested.

            Thanks,
            Ben

            Comment


            • #7
              bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?

              Comment


              • #8
                I'm looking forward trying Crossbow!

                It's satisfying finding other guys using and promoting cloud computing for bioinformatics analyses

                Congratulations!

                Comment


                • #9
                  Does Crossbow only analyze data from whole genome DNA sequencing?

                  Applicable to mRNA sequencing?

                  Comment


                  • #10
                    Cong! Sounds very cool!

                    Comment


                    • #11
                      Originally posted by lilithdog View Post
                      bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?
                      Do you mean you want to run bowtie?
                      First, you should cd the dir of bowtie executable file.
                      Code:
                      cd /PATH/TO/BOWTIE/DIR/
                      then, execute bowtie
                      Code:
                      ./bowtie
                      Xi Wang

                      Comment


                      • #12
                        crossbow on local cluster

                        Hi,
                        Did anybody tried crossbow on their local cluster?
                        I want to try the same.... Any insight and experience will be appreciated...

                        Thanks
                        ~Vix

                        Comment


                        • #13
                          Sample dataset for crossbow on local cluster

                          Originally posted by Ben Langmead View Post
                          And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

                          Ben
                          Hi Ben,
                          I want to try crossbow on my local hadoop enabled cluster. Can you share the data you tried for the "small version of the experiment on local Hadoop cluster". I am ending up with various errors while using other reads data.

                          With thanks,
                          Vix

                          Comment


                          • #14
                            Does Crossbow Produce Standard Bowtie Results?

                            Hey all,
                            Crossbow looks like a fantastic program. At this point I am just looking for a way to run Bowtie in parallel on an EC2 cluster. Does Crossbow have an option for just running Bowtie? If not then does Crossbow produce the Bowtie outputs that I can access? Any insight or suggestions on other software that may accomplish this would be great.
                            Dan

                            Comment


                            • #15
                              Hi all,
                              My question is same as Dan326. CloudBurst used rmap algorithm with hadoop, so Dan's question can be summarized as How can I run CloudBurst using bowtie algorithm rather than rmap. As paper of bowtie indicates, bowtie is much faster than other mapping tools, so, if it combines with hadoop, you will get the quickest solution so far. Correct me if I am wrong. I am also try to find this kind of short reads mapping solution.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                                by seqadmin


                                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                                01-24-2023, 01:19 PM
                              • seqadmin
                                Introduction to Single-Cell Sequencing
                                by seqadmin
                                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                                ...
                                01-09-2023, 03:10 PM
                              • seqadmin
                                AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
                                by seqadmin
                                Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

                                Read type and length
                                AVITI is a short-read benchtop sequencer that also offers an innovative...
                                12-29-2022, 10:44 AM

                              ad_right_rmr

                              Collapse
                              Working...
                              X