Does Crossbow Produce Standard Bowtie Results?
Hey all,
Crossbow looks like a fantastic program. At this point I am just looking for a way to run Bowtie in parallel on an EC2 cluster. Does Crossbow have an option for just running Bowtie? If not then does Crossbow produce the Bowtie outputs that I can access? Any insight or suggestions on other software that may accomplish this would be great.
Dan
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Sample dataset for crossbow on local cluster
Originally posted by Ben Langmead View PostAnd yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.
Ben
I want to try crossbow on my local hadoop enabled cluster. Can you share the data you tried for the "small version of the experiment on local Hadoop cluster". I am ending up with various errors while using other reads data.
With thanks,
Vix
Leave a comment:
-
crossbow on local cluster
Hi,
Did anybody tried crossbow on their local cluster?
I want to try the same.... Any insight and experience will be appreciated...
Thanks
~Vix
Leave a comment:
-
Originally posted by lilithdog View Postbowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?
First, you should cd the dir of bowtie executable file.
Code:cd /PATH/TO/BOWTIE/DIR/
Code:./bowtie
Leave a comment:
-
Does Crossbow only analyze data from whole genome DNA sequencing?
Applicable to mRNA sequencing?
Leave a comment:
-
I'm looking forward trying Crossbow!
It's satisfying finding other guys using and promoting cloud computing for bioinformatics analyses
Congratulations!
Leave a comment:
-
bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?
Leave a comment:
-
Hi all,
The Crossbow paper, Searching for SNPs with cloud computing came out in provisional form today. Take a look if you're interested.
Thanks,
Ben
Leave a comment:
-
The bad: How do you perform indel calling without modelling indels during alignment? Without proper identification of such variants (among others) whole-genome resequencing is not performed. Also, since only bowtie is currently supported, platforms like ABI SOLiD are not supported.
The good:
The authors of Crossbow have done an amazing job giving a proof-of-concept of running well-known tools (bowtie and SoapSNP) on the cloud. A potential next step would be to generalize crossbow to support any aligner, variant callers, or other analysis tool. Given this type of general framework, crossbow would solve the practical computational problem of human whole-genome re-sequencing using the tools that the user deems most suitable/powerful. The onus would not be on the crossbow authors to write this support, but to enable any author of such a tool to contribute to crossbow by writing support themselves.
How hard would it be to get other aligners, variant callers, or other tools to work in crossbow? Do they have to model the workflow of bowtie/SoapSNP?
I envision having a workflow where you align the reads, then run many variant callers (SNP/indel, reassembly, structural variants, others...), then other analysis (assessing the potential for the SNPs to cause protein coding changes etc.), and many more processes that both branch and merge.
Thanks for your contribution and I look forward to watching, and potentially contributing myself, to the evolution of crossbow.
Leave a comment:
-
Originally posted by nilshomer View PostAwesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?
Yes, we'd love to include indel calling; we'd love to include anything else that fits! And I think there are a lot of other things (indels, SV detection) that could fit.
And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.
Ben
Leave a comment:
-
Originally posted by Ben Langmead View PostHi all,
If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.
Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.
As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.
If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).
Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.
Thanks!
Ben and Mike
Leave a comment:
-
Crossbow: Genotyping from short reads using cloud computing
Hi all,
If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.
Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.
As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.
If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).
Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.
Thanks!
Ben and MikeTags: None
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: