hi Heng:
I appreciate your clarifications which are helpful.
I do want to mention that you are using "specificity" where I am pretty sure you mean "precision". (This is a widespread problem in the field - but I'm trying to correct it where I can.) E.g., you wrote: "If a mapper maps more correct reads but also much more wrong reads, it is still a mapper with low specificity." The definition of specificity is:
number of true negatives/(num of true negatives + num of false positives)
A "true negative" in the short-read alignment world is not very well defined, but we could define it as not aligning a read that doesn't belong to the genome at all. In any case, that's not what you mean.
Precision is defined as TP/(TP+FP). So I think you mean "precision" in what you are describing.
We know that Bowtie2 is not perfect - far from it! But we think it is a substantial improvement over Bowtie1. Ben Langmead has already made some changes (just this past week) to improve Bowtie2's accuracy. We'll keep at it.
Announcement
Collapse
No announcement yet.
X
-
Originally posted by salzberg View PostWe have submitted our results in a paper which is in the peer review process right now. I encourage both of you to do the same. Un-refereed claims on this forum are little more than anecdotes (which is true of my comments too, of course, so I won't be posting any more).
Leave a comment:
-
Steven, the sentence following "error free" explains it: "Although reads are error free, many reads cannot be perfectly mapped to the reference genome due to the presence of variations." (This sentence was on the very first version of that webpage.)
Perhaps you are still mixing overall sensitivity with sensitivity to unique hits and specificity. It is probably my problem of not explaining it clearly. As many others are also reading this thread, I will try to do better. I only compare bwa-sw and bwa-short to avoid sensitive issues.
I have known for a long time that on single-end 100bp real data, bwa-sw almost always correctly maps more reads than bwa-short. However, as bwa-sw does not have sufficient power to distinguish a good and a bad hit, it has to assign low mapping quality to a lot of perfectly "unique" hits to avoid giving too many high-quality false alignments. The effect is if we run a SNP caller, we sometimes call more correct SNPs from the bwa-short alignment than from bwa-sw, although bwa-sw maps much more reads. To this end, the sensitivity is only meaningful to real applications when the mapper has the ability to disambiguate good and bad hits. Bwa-sw is much more sensitive than bwa-short overall, but not always more sensitive to real applications (EDIT: bwa-sw may have better specificity for 100bp SE data, though).
For variant calling, sensitivity is actually not the major concern. We have already dropped several percent of reads in repetitive regions and filtered tens percent of reads with the Illumina pipeline, it does not hurt too much if we have a marginally higher false negative rate. The sensitivity is even less of a concern given deep sequencing because the coverage compensates the missing alignments due to excessive sequencing errors. In contrast, specificity is much more important especially given that mapping errors tend to be recurrent: if we wrongly map one read, we are likely to wrongly map other reads in the same region affected by the same true variants. The mere sequencing coverage may not help greatly to correct wrong variant calls caused by mapping errors. It is to me critical to evaluate specificity which you have not talked about much in your posts. Note that to evaluate specificity, we have to count the fraction of reads misplaced out of mapped reads. The overall number of correctly mapped reads has little to do with specificity. If a mapper maps more correct reads but also much more wrong reads, it is still a mapper with low specificity. Take bwa-sw and bwa-short as an example again. If reads have low quality tail, bwa-sw can even map more reads than bwa-short given paired-end reads, but I know for sure that bwa-short will greatly outperform bwa-sw in terms of specificity because bwa-sw does not use the pairing information to correct wrong alignments while bwa-short does.
Again, as I revisited the whole thread, I think we are just focusing on different measurements. We are both correct on the measurements we are interested in. Genericforms actually confirms both of us.
IMHO, being peer-reviewed does not always mean to be more correct. If I really want to write a paper on this evaluation, I am sure with my track of record I can get it published, but this does not make me more correct than you or others. My previous evaluations on maq/bwa/bwa-sw were all flawed if I look back (I was thinking the evaluations were the best possible at the time of writing the manuscripts, but I was wrong), but they have all been accepted. My review on alignment algorithms uses a similar ROC plot, it gets peer-reviewed and published, too.
Actually 1000g took similar procedure to evaluate read mappers about 2 years ago. I was not involved except suggesting measurements (simulation, evaluation and program running were all done by others). In some way, this is better than peer-review in that the measurement has been reviewed by many more. Also, in my benchmark, the whole procedure is open sourced and every command line is given. Everyone can try by themselves to validate if I am biased, wrong or lying. Many published papers do not have reproducibility of this level.
Given that I always think you are correct on the measurements you are using, I will also stop posting, too. This discussion is very helpful to me. Thank you.Last edited by lh3; 11-07-2011, 02:17 PM. Reason: Correct grammatical errors; mention illumina pipeline
Leave a comment:
-
@lh3 (Heng Li): you above "I never do simulation with error free reads." Yet you wrote on your webpage that you "simulate error free reads from the diploid genome." That is why I pointed out that you used error-free reads - you said so yourself.
@genericforms: you assert without proof that BWA "clearly wins out" if you account for false positives. Our results contradict this. We simulated both sequencing error (using the ART simulator v1.1.5) and the results of variation between individuals in our experiments, using 3 million paired-end reads. Bowtie2 assigned more reads to their true point of origin than BWA.
We have submitted our results in a paper which is in the peer review process right now. I encourage both of you to do the same. Un-refereed claims on this forum are little more than anecdotes (which is true of my comments too, of course, so I won't be posting any more).
Meanwhile I encourage everyone to try Bowtie2, which in our experiments has demonstrated unparalleled speed, sensitivity, and accuracy.
Leave a comment:
-
So Bowtie is definitely faster and we are able to reproduce the sensitivity gain, however if you account for false-positives, BWA clearly wins out. We simulated a fly genome (120MB) and 15X coverage and 100bp reads. There was a 0.1% mutation rate including 10% indels. The indels ranged from 1 to 10 bases.
So salzberg, what would be helpful is if you can try to reproduce what we have done with your 2million human reads. Tell me if you find a similar result.
Leave a comment:
-
Originally posted by lh3 View PostEDIT: genericforms reminds me that there is still a question about what accuracy is enough. I do not know the definite answer. It is possible that the difference between two mappers is so subtle that we do not observe differences in SNP/INDEL calls from real data, though my very limited experience seems to suggest the contrary. I could be wrong at the point.
Leave a comment:
-
We are going to try a comparison as well. When we compare mappers on the basis of proper read placement we plot TP/FP and we do this for different MapQs.
I agree with Heng Li in that users will be interested in recall rates for point mutations as well indels of different sizes. So we will explicitly examine this as well. I will let you guys know what we find.
Leave a comment:
-
I never do simulation with error-free reads. The reads in my simulation contain variants, which is equivalent to 1% SNP+INDEL error rate. 100k reads are enough for investigating specificity around 0.01% - we still have 100 wrong mappings, so the variance is pretty small. Also, I have run simulations for tens of millions of reads. The relative performance of novoalign, bwa and bwa-sw always stays the same. I also wanted to use real data, but it is hard to evaluate specificity on real data because there is no ground truth. One of the viable measurements is described the bwa-sw paper, but it is quite complicated to apply in practice to multiple mappers.
Nearly all aligners use heuristics. Few of them can guarantee to find the best hit even if the top hit is clearly (i.e. in all sensible scoring schemes) better than other hits. Here are several examples. In the following table, each line consists of bowtie2 position, bowtie2 XM:XO:XG, correct position, bwa samse XM:XO:XG and bwa-sw AS:XS (these examples also prove that my simulation is not error-free):
9:134616048 7:0:0 1:12746267 2:0:0 (bwa-sw wrong)
17:5319362 7:1:1 1:28924148 1:1:1 88:77
X:70135101 7:0:0 1:185975011 2:0:0 76:72
1:153251402 4:1:1 2:116348184 2:1:1 85:77
19:42604275 8:0:0 5:178218515 3:0:0 (bwa-sw wrong)
4:260872 6:1:1 7:129633785 0:1:1 92:76
All these reads do not have multiple hits, but you can see that bowtie2 misses the optimal position and chooses a position with more mismatches/gaps. I am not using these examples to argue bwa is more accurate -- I can of course find examples where bowtie2 does a better job than bwa -- what I want to argue is that even for "unique" hits, different mappers give different answers. Finding the "unique" hits is a really hard task. We cannot assume all mappers created with the same specificity. The ROC curve has shown this already.
As to the differences between your and my evaluations, I think they mainly come from two aspects: 1) for sensitivity, I am only counting hits with mapping quality greater than 0-3 (depending on mappers), but you are counting all hits including mapQ=0 hits; 2) I am evaluating specificity, while all your measurements are essentially sensitivity. Your conclusion is not inconsistent with mine. We just have different focuses. If I follow the same philosophy of yours, I am sure I will come to your conclusion with my 100k SE/PE reads/pairs, but I believe specificity and sensitivity of hits clearly having optimal positions are more important to accuracy-critical applications like variant calling and the discovery of structural variations.
EDIT: genericforms reminds me that there is still a question about what accuracy is enough. I do not know the definite answer. It is possible that the difference between two mappers is so subtle that we do not observe differences in SNP/INDEL calls from real data, though my very limited experience seems to suggest the contrary. I could be wrong at the point.Last edited by lh3; 11-06-2011, 08:59 AM.
Leave a comment:
-
Originally posted by salzberg View Post@lh3:
My guess is that other than repetitive reads, all the aligners generally get the mappings right
Leave a comment:
-
@lh3:
>>I believe I am usually fair in all benchmarks even involving my own programs. In my
>>benchmark, bwa/bwa-sw is clearly not the best and I am not hiding that at all. I am not
>>trying to make bowtie2 worse.
I understand that you believe you were being fair. But a single test using 100,000 error-free reads is rather unrealistic. Our tests on real data showed very different results from yours. Our tests on simulated data (not error-free, though) also showed very different results, so I'm not sure how you measured false positives. Given that there are billions of real reads now available, I think there's no reason not to do tests on real data as well.
The notion of "correct" mapping for multi-reads is a subtle one that many users don't care about: i.e., finding just the right mapping for a read that maps to 10, 100, or 1000 places doesn't really matter for most applications, even if it is possible to find such a mapping. My guess is that other than repetitive reads, all the aligners generally get the mappings right - and then the issue is whether they can find a mapping if the reads have errors and polymorphisms, which is what users do care about.
Leave a comment:
-
Originally posted by lh3 View Post@salzberg
You still avoid talking about "unique" alignments.
I am not sure that there is any special application that requires a very sensitive aligner, with lots of false positives.
Leave a comment:
-
@salzberg
You still avoid talking about "unique" alignments. For the seeding strategy like bowtie2, it is trivial to find a hit. But as I said, a key flaw in bowtie2 as well as bowtie1 is that sometimes it is unable to distinguish unique hits and repetitive hits and thus give low mapping quality to unique hits. It is more sensitive to a hit, but not sensitive to a unique hit. Also for 100bp single-end reads, the bowtie2 equivalence is really bwa-sw, not bwa-short; for paired-end reads, BWA-short will gain a lot of sensitivity and be much more accurate. Users like 1000g/sanger/broad also enable trimming on real data, though this seems unfair to bowtie2 and bowtie2 should still outperform in terms of overall sensitivity.
I believe I am usually fair in all benchmarks even involving my own programs. In my benchmark, bwa/bwa-sw is clearly not the best and I am not hiding that at all. I am not trying to make bowtie2 worse.
Perhaps the different result on simulated data is only because the simulation is different. I would love to see a ROC curve, which in my view the most informative plot revealing the overall accuracy (sensitivity vs. specificity) of a mapper. In your post, you were only talking about sensitivity, not specificity.Last edited by lh3; 11-05-2011, 09:08 AM.
Leave a comment:
-
In fact, we have done extensive comparisons of Bowtie2 versus both BWA and BWA-SW. Across multiple parameter settings for both tools, we found that Bowtie2 is (a) faster and (b) more sensitive than both programs. We tested it on 2,000,000 human reads, paired and unpaired, from an Illumina HiSeq instrument. I would note that the test by user lh3 (Heng Li, the author of BWA) used only simulated reads, and only 200,000 of them. Our tests were larger and more realistic.
We have detailed figures that Ben Langmead just presented at the Genome Informatics conference. I can't post the figures here, which contain dozens of experiments, but I will just post a few points showing performance using the default settings of Bowtie2 and BWA (and SOAP2):
Aligner Options Running time % reads aligned Mem(GB)
Bowtie2 --sensitive 11m:17s 96.94% 2.3
BWA -k 2 -l 32 -o 1 30m:52s 91.80% 2.4
SOAP2 -l 256 -v 5 -g 0 5m:08s 84.43% 5.3
As you can see, Bowtie2 aligned 5% more of the reads than BWA, and was 3 times faster.
We also compared Bowtie2 to BWA-SW on Ion Torrent and 454 reads, which contain many indels. Bowtie2 was superior to BWA-SW on both speed and sensitivity for a wide range of parameter settings of both programs.
We also compared the accuracy of both BWA and Bowtie on human reads in a simulation using 3 million paired and unpaired 75 bp Illumina reads, simulated so we knew the "truth". Note that this is 30 times more data than lh3's simulated results on his website. Our findings were that Bowtie2 aligned approximately 3% more reads correctly from unpaired reads, and approximately 1% more reads correctly from paired reads. This test used default parameters of both programs.
Thus in our tests, Bowtie2 is faster, more sensitive, and more accurate than BWA across a wide range of parameter settings.Last edited by salzberg; 11-05-2011, 08:33 AM.
Leave a comment:
-
Updated to bowtie2-beta3 and added timing. If you wonder why the sensitivity in the plot is different from that in the bowtie2 poster, that is because 1) bwa-short is indeed not very sensitive on real single-end data without trimming; bwa-sw is much better; 2) That poster is counting all alignments, but I am counting "unique" alignments only. Bowtie2 can map many reads, but it has difficulty in distinguishing good and bad hits and thus give many good hits low mapping quality. Beta3 is much better than beta2 at this point, but still not perfect.
Basically bowtie2 chooses a nice balance point where it is the fastest without much loss of accuracy in comparison to others, but for variant calling for Illumina data, novoalign/smalt/bwa/gsnap may still be the mapper of choice. Things may change in future of course. Bowtie2 is still in beta, while bwa and bwa-sw are mature (i.e. not many improvements can be made).Last edited by lh3; 11-05-2011, 08:14 AM.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
...-
Channel: Articles
11-27-2023, 01:15 PM -
-
by seqadmin
Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...-
Channel: Articles
11-09-2023, 07:02 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 10:48 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 10:48 AM
|
||
Started by seqadmin, 11-29-2023, 08:26 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
11-29-2023, 08:26 AM
|
||
Started by seqadmin, 11-29-2023, 08:12 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
11-29-2023, 08:12 AM
|
||
Started by seqadmin, 11-27-2023, 08:12 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
11-27-2023, 08:12 AM
|
Leave a comment: