Has anyone compared the speed of BWA and Bowtie 2? How about the accuracy for both point mutation and indels?
Unconfigured Ad
Collapse
X
-
-
Updated to bowtie2-beta3 and added timing. If you wonder why the sensitivity in the plot is different from that in the bowtie2 poster, that is because 1) bwa-short is indeed not very sensitive on real single-end data without trimming; bwa-sw is much better; 2) That poster is counting all alignments, but I am counting "unique" alignments only. Bowtie2 can map many reads, but it has difficulty in distinguishing good and bad hits and thus give many good hits low mapping quality. Beta3 is much better than beta2 at this point, but still not perfect.
Basically bowtie2 chooses a nice balance point where it is the fastest without much loss of accuracy in comparison to others, but for variant calling for Illumina data, novoalign/smalt/bwa/gsnap may still be the mapper of choice. Things may change in future of course. Bowtie2 is still in beta, while bwa and bwa-sw are mature (i.e. not many improvements can be made).Last edited by lh3; 11-05-2011, 08:14 AM.
Comment
-
-
In fact, we have done extensive comparisons of Bowtie2 versus both BWA and BWA-SW. Across multiple parameter settings for both tools, we found that Bowtie2 is (a) faster and (b) more sensitive than both programs. We tested it on 2,000,000 human reads, paired and unpaired, from an Illumina HiSeq instrument. I would note that the test by user lh3 (Heng Li, the author of BWA) used only simulated reads, and only 200,000 of them. Our tests were larger and more realistic.
We have detailed figures that Ben Langmead just presented at the Genome Informatics conference. I can't post the figures here, which contain dozens of experiments, but I will just post a few points showing performance using the default settings of Bowtie2 and BWA (and SOAP2):
Aligner Options Running time % reads aligned Mem(GB)
Bowtie2 --sensitive 11m:17s 96.94% 2.3
BWA -k 2 -l 32 -o 1 30m:52s 91.80% 2.4
SOAP2 -l 256 -v 5 -g 0 5m:08s 84.43% 5.3
As you can see, Bowtie2 aligned 5% more of the reads than BWA, and was 3 times faster.
We also compared Bowtie2 to BWA-SW on Ion Torrent and 454 reads, which contain many indels. Bowtie2 was superior to BWA-SW on both speed and sensitivity for a wide range of parameter settings of both programs.
We also compared the accuracy of both BWA and Bowtie on human reads in a simulation using 3 million paired and unpaired 75 bp Illumina reads, simulated so we knew the "truth". Note that this is 30 times more data than lh3's simulated results on his website. Our findings were that Bowtie2 aligned approximately 3% more reads correctly from unpaired reads, and approximately 1% more reads correctly from paired reads. This test used default parameters of both programs.
Thus in our tests, Bowtie2 is faster, more sensitive, and more accurate than BWA across a wide range of parameter settings.Last edited by salzberg; 11-05-2011, 08:33 AM.
Comment
-
-
@salzberg
You still avoid talking about "unique" alignments. For the seeding strategy like bowtie2, it is trivial to find a hit. But as I said, a key flaw in bowtie2 as well as bowtie1 is that sometimes it is unable to distinguish unique hits and repetitive hits and thus give low mapping quality to unique hits. It is more sensitive to a hit, but not sensitive to a unique hit. Also for 100bp single-end reads, the bowtie2 equivalence is really bwa-sw, not bwa-short; for paired-end reads, BWA-short will gain a lot of sensitivity and be much more accurate. Users like 1000g/sanger/broad also enable trimming on real data, though this seems unfair to bowtie2 and bowtie2 should still outperform in terms of overall sensitivity.
I believe I am usually fair in all benchmarks even involving my own programs. In my benchmark, bwa/bwa-sw is clearly not the best and I am not hiding that at all. I am not trying to make bowtie2 worse.
Perhaps the different result on simulated data is only because the simulation is different. I would love to see a ROC curve, which in my view the most informative plot revealing the overall accuracy (sensitivity vs. specificity) of a mapper. In your post, you were only talking about sensitivity, not specificity.Last edited by lh3; 11-05-2011, 09:08 AM.
Comment
-
-
This is a interesting subtlety. In my experience comparing BWA, bowtie1, and GSNAP, using BWA's wg_sim and wg_sim_eval there was a significant penalty for multimaps, since each alternative mapping was considered a miss, and BWA had a devious algorithm which cut multimaps off after 11 hits, and simply reported it as too ambiguous. However, when the evaluation code was rewritten to only count a multimap as one miss(rather than multiple misses), BWA was still superior to bowtie1 or GSNAP. GSNAP in particular was bad about reporting multimaps.Originally posted by lh3 View Post@salzberg
You still avoid talking about "unique" alignments.
I am not sure that there is any special application that requires a very sensitive aligner, with lots of false positives.
Comment
-
-
@lh3:
>>I believe I am usually fair in all benchmarks even involving my own programs. In my
>>benchmark, bwa/bwa-sw is clearly not the best and I am not hiding that at all. I am not
>>trying to make bowtie2 worse.
I understand that you believe you were being fair. But a single test using 100,000 error-free reads is rather unrealistic. Our tests on real data showed very different results from yours. Our tests on simulated data (not error-free, though) also showed very different results, so I'm not sure how you measured false positives. Given that there are billions of real reads now available, I think there's no reason not to do tests on real data as well.
The notion of "correct" mapping for multi-reads is a subtle one that many users don't care about: i.e., finding just the right mapping for a read that maps to 10, 100, or 1000 places doesn't really matter for most applications, even if it is possible to find such a mapping. My guess is that other than repetitive reads, all the aligners generally get the mappings right - and then the issue is whether they can find a mapping if the reads have errors and polymorphisms, which is what users do care about.
Comment
-
-
I disagree. If you look at hash based aligners there are certain patterns of indels, mismatches and errors, where they won't find the right result even if it is unique. For example if the word size is 15, and there are are two mismatches 10 bases apart in a 50mer, the hash won't return the region at all. Likewise for longer reads the number of mismatches is likely to be higher and the Suffix Array search will terminate before finding the ideal match.Originally posted by salzberg View Post@lh3:
My guess is that other than repetitive reads, all the aligners generally get the mappings right
Comment
-
-
I never do simulation with error-free reads. The reads in my simulation contain variants, which is equivalent to 1% SNP+INDEL error rate. 100k reads are enough for investigating specificity around 0.01% - we still have 100 wrong mappings, so the variance is pretty small. Also, I have run simulations for tens of millions of reads. The relative performance of novoalign, bwa and bwa-sw always stays the same. I also wanted to use real data, but it is hard to evaluate specificity on real data because there is no ground truth. One of the viable measurements is described the bwa-sw paper, but it is quite complicated to apply in practice to multiple mappers.
Nearly all aligners use heuristics. Few of them can guarantee to find the best hit even if the top hit is clearly (i.e. in all sensible scoring schemes) better than other hits. Here are several examples. In the following table, each line consists of bowtie2 position, bowtie2 XM:XO:XG, correct position, bwa samse XM:XO:XG and bwa-sw AS:XS (these examples also prove that my simulation is not error-free):
9:134616048 7:0:0 1:12746267 2:0:0 (bwa-sw wrong)
17:5319362 7:1:1 1:28924148 1:1:1 88:77
X:70135101 7:0:0 1:185975011 2:0:0 76:72
1:153251402 4:1:1 2:116348184 2:1:1 85:77
19:42604275 8:0:0 5:178218515 3:0:0 (bwa-sw wrong)
4:260872 6:1:1 7:129633785 0:1:1 92:76
All these reads do not have multiple hits, but you can see that bowtie2 misses the optimal position and chooses a position with more mismatches/gaps. I am not using these examples to argue bwa is more accurate -- I can of course find examples where bowtie2 does a better job than bwa -- what I want to argue is that even for "unique" hits, different mappers give different answers. Finding the "unique" hits is a really hard task. We cannot assume all mappers created with the same specificity. The ROC curve has shown this already.
As to the differences between your and my evaluations, I think they mainly come from two aspects: 1) for sensitivity, I am only counting hits with mapping quality greater than 0-3 (depending on mappers), but you are counting all hits including mapQ=0 hits; 2) I am evaluating specificity, while all your measurements are essentially sensitivity. Your conclusion is not inconsistent with mine. We just have different focuses. If I follow the same philosophy of yours, I am sure I will come to your conclusion with my 100k SE/PE reads/pairs, but I believe specificity and sensitivity of hits clearly having optimal positions are more important to accuracy-critical applications like variant calling and the discovery of structural variations.
EDIT: genericforms reminds me that there is still a question about what accuracy is enough. I do not know the definite answer. It is possible that the difference between two mappers is so subtle that we do not observe differences in SNP/INDEL calls from real data, though my very limited experience seems to suggest the contrary. I could be wrong at the point.Last edited by lh3; 11-06-2011, 08:59 AM.
Comment
-
-
We are going to try a comparison as well. When we compare mappers on the basis of proper read placement we plot TP/FP and we do this for different MapQs.
I agree with Heng Li in that users will be interested in recall rates for point mutations as well indels of different sizes. So we will explicitly examine this as well. I will let you guys know what we find.
Comment
-
-
A bit of a naive question. In the context of a scientific project mapping is merely a single step, and if you have compounding errors over multiple steps. From dna/rna collection to library prep to sequencing, base calling, mapping, SNP calling and so on. It is pertinent that every step be as accurate as possible so as not to impose limitations on subsequent experiments, computations, analysis, interpretations etc. Unfortunately it appears that mapping is far behind the state of the art of accuracy in sequencing technologies.Originally posted by lh3 View PostEDIT: genericforms reminds me that there is still a question about what accuracy is enough. I do not know the definite answer. It is possible that the difference between two mappers is so subtle that we do not observe differences in SNP/INDEL calls from real data, though my very limited experience seems to suggest the contrary. I could be wrong at the point.
Comment
-
-
So Bowtie is definitely faster and we are able to reproduce the sensitivity gain, however if you account for false-positives, BWA clearly wins out. We simulated a fly genome (120MB) and 15X coverage and 100bp reads. There was a 0.1% mutation rate including 10% indels. The indels ranged from 1 to 10 bases.
So salzberg, what would be helpful is if you can try to reproduce what we have done with your 2million human reads. Tell me if you find a similar result.
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
Yesterday, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Yesterday, 12:03 PM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 12:03 PM
|
||
|
Started by SEQadmin2, Yesterday, 11:40 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
29 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-26-2026, 10:12 AM
|
0 responses
31 views
0 reactions
|
Last Post
by SEQadmin2
05-26-2026, 10:12 AM
|
Comment