Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Ben Langmead
    Senior Member
    • Sep 2008
    • 200

    Bowtie, an ultrafast, memory-efficient, open source short read aligner

    Hello all,

    If you work with large genomes and large sets of short reads, please
    take a look at Bowtie (http://bowtie-bio.sf.net), a new open source
    short read aligner written by myself and Cole Trapnell at the
    University of Maryland. Bowtie is an ultrafast, memory-efficient short
    read aligner. It aligns short reads to the human genome at a rate of 25
    million reads per hour on a typical workstation with 2 gigabytes of
    memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep
    its memory footprint small: about 1.3 GB for the human genome. It
    supports alignment policies equivalent to Maq and SOAP, but at much
    greater speeds.

    As a denizen of these forums, you probably appreciate that there are
    now many, many short read aligners to choose from. Our goal with
    Bowtie was to exploit an algorithmic "sweet spot" to bring ultrafast
    read alignment to typical desktop computers. These days, a typical
    desktop has 2 or 4 gigabytes of RAM and multiple (2 or 4) processor
    cores. I recently used Bowtie on my own 4-core, 2 GB desktop to align
    14.3x coverage worth of Illumina/Solexa reads from the 1000-Genomes
    project to the human genome in a single overnight (14 hours). This is
    significantly faster than both Eland and ZOOM, and makes it much easier
    and faster to extract biological evidence from these huge datasets.

    Here is a brief feature list, but if you are interested then please
    check our site regularly because Bowtie is actively being developed and
    maintained:
    • Extremely fast!
    • Specify any number of parallel search threads with -p (uses pthreads) to exploit multiple processor cores
    • Small index: for human, memory footprint is ~1.3GB (with -z option), size on disk is ~2.2GB
    • Pre-built indexes available from website: http://bowtie-bio.sf.net
      • Human, chimp, dog, mouse, rat, chicken, a. thaliana, fruitfly, etc.
    • Input formats: FASTA, FASTQ, FASTQ w/ Solexa quals, raw, command-line
    • Includes tool to convert Bowtie output to a Maq .map file so that you can use Bowtie's output with, e.g., 'maq assemble' and 'maq cns2cnp'
    • Use -n option to activate a Maq-like policy
      • N (set with -n) mismatches allowed in first L (set with -l) bases
      • Sum of quality values at mismatched positions may not exceed E (set with -e)
    • Use -v option to activate a SOAP-like policy
      • V (set with -v) mismatches allowed in the whole alignment
      • Quality values are ignored
    • Flexible reporting:
      • Use -k to report K alignments
      • Use -a to report all alignments
      • Use --best to guarantee that the alignment(s) reported are "best" in terms of # of mismatches
      • These come at a cost to speed! See manual for details.

    As mentioned in the "Software packages for next gen sequence analysis"
    thread, Bowtie does not yet support paired-end alignment or indels.
    Both features are very much on our to-do list, though, so please keep
    an eye out new versions over the coming months.

    Thanks very much!
    Ben Langmead
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    #2
    Nice work Ben. Happy to have you here! Any plans for colorspace?

    Comment

    • Ben Langmead
      Senior Member
      • Sep 2008
      • 200

      #3
      Hi ECO. We've talked through how we would add colorspace support, and it's conceptually pretty simple. It is work, though! Right now, we consider indel and paired-end support the two biggest missing pieces.

      Is ABI support valuable to you? We're always interested to hear what features people want.

      Thanks,
      Ben

      Comment

      • ECO
        --Site Admin--
        • Oct 2007
        • 1360

        #4
        Good to hear it's on the feature list somewhere!

        It's definitely in my interest to have fast cutting edge tools that support colorspace. I'm drooling at 35x faster than maq.

        Comment

        • new300
          Member
          • Mar 2008
          • 50

          #5
          What license is it released under?

          Comment

          • Ben Langmead
            Senior Member
            • Sep 2008
            • 200

            #6
            It's released under the Artistic License, which is free and lacks a reciprocity clause (the thing that scares some people about the GPL).

            Comment

            • dmamartin
              Junior Member
              • Jul 2008
              • 4

              #7
              OK, so I have downloaded ZOOM this week having seen the paper in Bioinformatics and found that for my purposes it is much faster than vmatch.
              I rewrite my scripts and start data processing and then come across your announcement above.

              There are some programs which claim a massive speedup that is only detectable by using sophisticated benchmarks, or carefully designed datasets. So I used the first chunk of my analysis to benchmark as that would be realistic for my purposes.

              I'm looking for matches where the oligo can have up to 2 mismatches and may match up to 4 times per chromosome. I'm not using quality scores as I have already prefiltered the data by quality so have mixed length input data.

              20K sequences vs human chr1 is the benchmark test. All performed on the same hardware which is (I think) a quad core 8GB RAM machine reading and writing to a fibrechannel connected disk array.

              vmatch - 240 mins or thereabouts.
              ZOOM - 23 mins
              Bowtie - 20 seconds.

              I'll be sending you the medical bill for my bruised jaw. No longer can I stall my collaborators by telling them that the analysis is still running and they should leave me to my coffee..

              ..d

              Comment

              • dmamartin
                Junior Member
                • Jul 2008
                • 4

                #8
                Originally posted by dmamartin View Post
                Bowtie - 20 seconds.
                That is with --best -k 100, not the most speedy of searches.

                Comment

                • zee
                  NGS specialist
                  • Apr 2008
                  • 249

                  #9
                  Dmamartin,

                  Could you perhaps report the results of your mapping benchmark with novoalign (www.novocraft.com)? It will be interesting to see how it performs on your reads in terms of speed and any other metrics e.g. specificity/sensitivity.

                  Bowtie is really good. I tried it out and it gets the job done in an incredibly short time so that's a huge benefit. Building an index of the human genome with bowtie-index took almost 4 hours (2.4 GHz Xeon, 32Gb RAM) but that's only a once off thing and I can see how the BW method shows superiority in alignment seeding.
                  We could probably adapt it in later versions if there is a major differential on short read alignment performance.

                  Comment

                  • dmamartin
                    Junior Member
                    • Jul 2008
                    • 4

                    #10
                    We'll see what we can do. Having no need in the immediate future to rerun the analysis it may take a short while to get around to it, but we will definitely add it to the bench mark test one of my colleagues will be doing (in a more elegant and rigorous manner than my quick and dirty run).

                    ..d

                    Comment

                    • Chipper
                      Senior Member
                      • Mar 2008
                      • 323

                      #11
                      Ok, 9 million reads in less than 2 minutes???

                      And this with reads of different lengths which I think no other program allows. Did not believe it first, but alignments seems to be valid. Amazing stuff.

                      Comment

                      • zee
                        NGS specialist
                        • Apr 2008
                        • 249

                        #12
                        Originally posted by Chipper View Post
                        Ok, 9 million reads in less than 2 minutes???

                        And this with reads of different lengths which I think no other program allows. Did not believe it first, but alignments seems to be valid. Amazing stuff.
                        Novoalign does variable length reads for both single and paired-end runs.

                        Comment

                        • ECO
                          --Site Admin--
                          • Oct 2007
                          • 1360

                          #13
                          Added Bowtie.

                          Comment

                          • Chipper
                            Senior Member
                            • Mar 2008
                            • 323

                            #14
                            Originally posted by zee View Post
                            Novoalign does variable length reads for both single and paired-end runs.
                            Thanks, now I know better. I tried it now and it seems to work well, just not as fast.

                            Comment

                            • zee
                              NGS specialist
                              • Apr 2008
                              • 249

                              #15
                              Like Eland, Bowtie is exceptional because it is fast and has many of the desirable features we want out of a short read aligner. The Burrows-Wheeler index is one of the most efficient methods for rapid K-mer searching. In the future I think we will see more of these efficient techniques being used for solving the problem of high-throughput mapping.

                              I feel as though the standard should be that we align them faster than we can sequence them

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                Today, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              26 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              33 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              25 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              190 views
                              0 reactions
                              Last Post seqadmin  
                              Working...