No announcement yet.

Bowtie, an ultrafast, memory-efficient, open source short read aligner

  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie, an ultrafast, memory-efficient, open source short read aligner

    Hello all,

    If you work with large genomes and large sets of short reads, please
    take a look at Bowtie (, a new open source
    short read aligner written by myself and Cole Trapnell at the
    University of Maryland. Bowtie is an ultrafast, memory-efficient short
    read aligner. It aligns short reads to the human genome at a rate of 25
    million reads per hour on a typical workstation with 2 gigabytes of
    memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep
    its memory footprint small: about 1.3 GB for the human genome. It
    supports alignment policies equivalent to Maq and SOAP, but at much
    greater speeds.

    As a denizen of these forums, you probably appreciate that there are
    now many, many short read aligners to choose from. Our goal with
    Bowtie was to exploit an algorithmic "sweet spot" to bring ultrafast
    read alignment to typical desktop computers. These days, a typical
    desktop has 2 or 4 gigabytes of RAM and multiple (2 or 4) processor
    cores. I recently used Bowtie on my own 4-core, 2 GB desktop to align
    14.3x coverage worth of Illumina/Solexa reads from the 1000-Genomes
    project to the human genome in a single overnight (14 hours). This is
    significantly faster than both Eland and ZOOM, and makes it much easier
    and faster to extract biological evidence from these huge datasets.

    Here is a brief feature list, but if you are interested then please
    check our site regularly because Bowtie is actively being developed and
    • Extremely fast!
    • Specify any number of parallel search threads with -p (uses pthreads) to exploit multiple processor cores
    • Small index: for human, memory footprint is ~1.3GB (with -z option), size on disk is ~2.2GB
    • Pre-built indexes available from website:
      • Human, chimp, dog, mouse, rat, chicken, a. thaliana, fruitfly, etc.
    • Input formats: FASTA, FASTQ, FASTQ w/ Solexa quals, raw, command-line
    • Includes tool to convert Bowtie output to a Maq .map file so that you can use Bowtie's output with, e.g., 'maq assemble' and 'maq cns2cnp'
    • Use -n option to activate a Maq-like policy
      • N (set with -n) mismatches allowed in first L (set with -l) bases
      • Sum of quality values at mismatched positions may not exceed E (set with -e)
    • Use -v option to activate a SOAP-like policy
      • V (set with -v) mismatches allowed in the whole alignment
      • Quality values are ignored
    • Flexible reporting:
      • Use -k to report K alignments
      • Use -a to report all alignments
      • Use --best to guarantee that the alignment(s) reported are "best" in terms of # of mismatches
      • These come at a cost to speed! See manual for details.

    As mentioned in the "Software packages for next gen sequence analysis"
    thread, Bowtie does not yet support paired-end alignment or indels.
    Both features are very much on our to-do list, though, so please keep
    an eye out new versions over the coming months.

    Thanks very much!
    Ben Langmead

  • #2
    Nice work Ben. Happy to have you here! Any plans for colorspace?


    • #3
      Hi ECO. We've talked through how we would add colorspace support, and it's conceptually pretty simple. It is work, though! Right now, we consider indel and paired-end support the two biggest missing pieces.

      Is ABI support valuable to you? We're always interested to hear what features people want.



      • #4
        Good to hear it's on the feature list somewhere!

        It's definitely in my interest to have fast cutting edge tools that support colorspace. I'm drooling at 35x faster than maq.


        • #5
          What license is it released under?


          • #6
            It's released under the Artistic License, which is free and lacks a reciprocity clause (the thing that scares some people about the GPL).


            • #7
              OK, so I have downloaded ZOOM this week having seen the paper in Bioinformatics and found that for my purposes it is much faster than vmatch.
              I rewrite my scripts and start data processing and then come across your announcement above.

              There are some programs which claim a massive speedup that is only detectable by using sophisticated benchmarks, or carefully designed datasets. So I used the first chunk of my analysis to benchmark as that would be realistic for my purposes.

              I'm looking for matches where the oligo can have up to 2 mismatches and may match up to 4 times per chromosome. I'm not using quality scores as I have already prefiltered the data by quality so have mixed length input data.

              20K sequences vs human chr1 is the benchmark test. All performed on the same hardware which is (I think) a quad core 8GB RAM machine reading and writing to a fibrechannel connected disk array.

              vmatch - 240 mins or thereabouts.
              ZOOM - 23 mins
              Bowtie - 20 seconds.

              I'll be sending you the medical bill for my bruised jaw. No longer can I stall my collaborators by telling them that the analysis is still running and they should leave me to my coffee..



              • #8
                Originally posted by dmamartin View Post
                Bowtie - 20 seconds.
                That is with --best -k 100, not the most speedy of searches.


                • #9

                  Could you perhaps report the results of your mapping benchmark with novoalign ( It will be interesting to see how it performs on your reads in terms of speed and any other metrics e.g. specificity/sensitivity.

                  Bowtie is really good. I tried it out and it gets the job done in an incredibly short time so that's a huge benefit. Building an index of the human genome with bowtie-index took almost 4 hours (2.4 GHz Xeon, 32Gb RAM) but that's only a once off thing and I can see how the BW method shows superiority in alignment seeding.
                  We could probably adapt it in later versions if there is a major differential on short read alignment performance.


                  • #10
                    We'll see what we can do. Having no need in the immediate future to rerun the analysis it may take a short while to get around to it, but we will definitely add it to the bench mark test one of my colleagues will be doing (in a more elegant and rigorous manner than my quick and dirty run).



                    • #11
                      Ok, 9 million reads in less than 2 minutes???

                      And this with reads of different lengths which I think no other program allows. Did not believe it first, but alignments seems to be valid. Amazing stuff.


                      • #12
                        Originally posted by Chipper View Post
                        Ok, 9 million reads in less than 2 minutes???

                        And this with reads of different lengths which I think no other program allows. Did not believe it first, but alignments seems to be valid. Amazing stuff.
                        Novoalign does variable length reads for both single and paired-end runs.


                        • #13
                          Added Bowtie.


                          • #14
                            Originally posted by zee View Post
                            Novoalign does variable length reads for both single and paired-end runs.
                            Thanks, now I know better. I tried it now and it seems to work well, just not as fast.


                            • #15
                              Like Eland, Bowtie is exceptional because it is fast and has many of the desirable features we want out of a short read aligner. The Burrows-Wheeler index is one of the most efficient methods for rapid K-mer searching. In the future I think we will see more of these efficient techniques being used for solving the problem of high-throughput mapping.

                              I feel as though the standard should be that we align them faster than we can sequence them