Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dan326
    replied
    Does Crossbow Produce Standard Bowtie Results?

    Hey all,
    Crossbow looks like a fantastic program. At this point I am just looking for a way to run Bowtie in parallel on an EC2 cluster. Does Crossbow have an option for just running Bowtie? If not then does Crossbow produce the Bowtie outputs that I can access? Any insight or suggestions on other software that may accomplish this would be great.
    Dan

    Leave a comment:


  • VIX_Z
    replied
    Sample dataset for crossbow on local cluster

    Originally posted by Ben Langmead View Post
    And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

    Ben
    Hi Ben,
    I want to try crossbow on my local hadoop enabled cluster. Can you share the data you tried for the "small version of the experiment on local Hadoop cluster". I am ending up with various errors while using other reads data.

    With thanks,
    Vix

    Leave a comment:


  • VIX_Z
    replied
    crossbow on local cluster

    Hi,
    Did anybody tried crossbow on their local cluster?
    I want to try the same.... Any insight and experience will be appreciated...

    Thanks
    ~Vix

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by lilithdog View Post
    bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?
    Do you mean you want to run bowtie?
    First, you should cd the dir of bowtie executable file.
    Code:
    cd /PATH/TO/BOWTIE/DIR/
    then, execute bowtie
    Code:
    ./bowtie

    Leave a comment:


  • jlmlj
    replied
    Cong! Sounds very cool!

    Leave a comment:


  • jiwu2573
    replied
    Does Crossbow only analyze data from whole genome DNA sequencing?

    Applicable to mRNA sequencing?

    Leave a comment:


  • mmanrique
    replied
    I'm looking forward trying Crossbow!

    It's satisfying finding other guys using and promoting cloud computing for bioinformatics analyses

    Congratulations!

    Leave a comment:


  • lilithdog
    replied
    bowtie is extremely impressive! However i cannot get it to load to my terminal. I am using $ cd bowtie and $ cd ./bowtie, but the commands are unrecognized. I'm in Ubuntu, but i know this works fine on a Mac. Any suggestions for such a dumb question?

    Leave a comment:


  • Ben Langmead
    replied
    Hi all,

    The Crossbow paper, Searching for SNPs with cloud computing came out in provisional form today. Take a look if you're interested.

    Thanks,
    Ben

    Leave a comment:


  • nilshomer
    replied
    The bad: How do you perform indel calling without modelling indels during alignment? Without proper identification of such variants (among others) whole-genome resequencing is not performed. Also, since only bowtie is currently supported, platforms like ABI SOLiD are not supported.

    The good:
    The authors of Crossbow have done an amazing job giving a proof-of-concept of running well-known tools (bowtie and SoapSNP) on the cloud. A potential next step would be to generalize crossbow to support any aligner, variant callers, or other analysis tool. Given this type of general framework, crossbow would solve the practical computational problem of human whole-genome re-sequencing using the tools that the user deems most suitable/powerful. The onus would not be on the crossbow authors to write this support, but to enable any author of such a tool to contribute to crossbow by writing support themselves.

    How hard would it be to get other aligners, variant callers, or other tools to work in crossbow? Do they have to model the workflow of bowtie/SoapSNP?

    I envision having a workflow where you align the reads, then run many variant callers (SNP/indel, reassembly, structural variants, others...), then other analysis (assessing the potential for the SNPs to cause protein coding changes etc.), and many more processes that both branch and merge.

    Thanks for your contribution and I look forward to watching, and potentially contributing myself, to the evolution of crossbow.

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by nilshomer View Post
    Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?
    Hey Nils,

    Yes, we'd love to include indel calling; we'd love to include anything else that fits! And I think there are a lot of other things (indels, SV detection) that could fit.

    And yes, Crossbow can be run on a non-EC2 cluster as long as (a) you have working Bowtie and SOAPsnp binaries for the cluster machines' OS, (b) Hadoop is installed, and (c) you don't mind tweaking some settings in the driver script. For our experiments, we generally tried a small version of the experiment on our local Hadoop cluster first, then, once we confirmed a sane result, we ran the full experiment on EC2 where we could grab up hundreds of cores. The script we used to run on the local cluster is included in the download (local/crossbow.pl). Note that that script will need some tweaking before it works on your cluster, since, unlike with EC2, we don't know what your filesystem and settings will look like ahead of time.

    Ben

    Leave a comment:


  • nilshomer
    replied
    Originally posted by Ben Langmead View Post
    Hi all,

    If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.

    Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.

    As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.

    If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).

    Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.

    Thanks!
    Ben and Mike
    Awesome! Any plans to include indel calling? Also, does this run on a hadoop enabled cluster or just for the Amazon cloud?

    Leave a comment:


  • lh3
    replied
    This sounds great!

    Leave a comment:


  • Crossbow: Genotyping from short reads using cloud computing

    Hi all,

    If you work with large genomes and large sets of short reads, please take a look at Crossbow (http://bowtie-bio.sf.net/crossbow), an open source pipeline leveraging cloud computing for whole genome SNP discovery from short reads. Crossbow combines Bowtie and SoapSNP, under the umbrella of Hadoop. Hadoop handles all data movement and large distributed sorts (e.g. between alignment and SNP calling), and provides storage redundancy and fault tolerance. In experiments, we observe that Crossbow aligns Illumina reads and calls accurate SNPs (99% concordance with a BeadChip assay) from over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $100 using a 40-node, 320-core Hadoop cluster rented from Amazon's EC2 utility computing service.

    Crossbow is distributed with driver scripts both for running either on a local cluster or on a cluster rented through Amazon EC2. Crossbow also includes scripts that automatically preprocess and copy large datasets into Amazon S3. Both EC2 and S3 are accessible to anyone with an AWS account (and a credit card), giving the user full control over computers and storage rented over the Internet on a pay-as-you-go basis.

    As of this posting, Crossbow is preliminary software (witness: the version number starts with a 0), though we are actively maintaining and extending it.

    If you're looking for how to get started, first read through the "Checklist for Preparing to Run on Amazon Web Services" in the MANUAL file, then read through the TUTORIAL (which currently just points to the C. elegans example).

    Crossbow is written by myself (Ben Langmead, Johns Hopkins University) and Michael C. Schatz at University of Maryland.

    Thanks!
    Ben and Mike

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X