Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Looking at those ROC curves, it appears to me that Novoalign is the best mapper in the specified simulation that was run with respect to sensitivity and specificity. Is this a correct interpretation?
-
Originally posted by rskr View PostI disagree. If you look at hash based aligners there are certain patterns of indels, mismatches and errors, where they won't find the right result even if it is unique. For example if the word size is 15, and there are are two mismatches 10 bases apart in a 50mer, the hash won't return the region at all. Likewise for longer reads the number of mismatches is likely to be higher and the Suffix Array search will terminate before finding the ideal match.
Leave a comment:
-
Originally posted by cjp View Post@nickloman
The output of the wgsim_eval.pl program looks a bit like the data below - bowtie 1 always gives a mapping score of 255 (column1). I'm guessing that bowtie 2 has many FP's at a mapping score of 1 (column3 if column1 == 1), but cumulatively finds more TP's with all mapping scores (column2 if column1 == 1). But I was also wondering the exact meaning from the output of the wgsim_eval.pl script.
% tail *.roc
==> bowtie2.roc <==
14 172922 11
13 172925 12
12 177943 27
11 177945 28
10 179990 37
9 179995 40
4 180250 40
3 187273 578
2 187324 580
1 199331 5877
==> bowtie.roc <==
255 86206 1740
==> bwa.roc <==
10 192354 72
9 192560 107
8 192595 107
7 192628 110
6 192652 115
5 192669 116
4 192681 117
3 192731 117
2 192741 118
1 192762 119
Chris
Leave a comment:
-
Each line consists of mapping quality threshold, #mapped reads with mapQ no less than the 1st column and #mismapped reads. It does not show #reads with mapQ=0. If we include mapQ=0 mappings, the sensitivity of bwa is also good for simulated data, but on single-end real data, the low-quality tail on reads makes bwa much worse. This is what Steven and Ben have observed. This is also why it is recommended to enable trimming when using bwa.
BWA always gives mapQ 0 to repetitive hits, but other mappers (gsnap, bowtie2 and novoalign) may give mapQ<=3 to repetitive hits. This is theoretically correct. I may further set a mapQ threshold 1-4 when plotting.
Leave a comment:
-
@nickloman
The output of the wgsim_eval.pl program looks a bit like the data below - bowtie 1 always gives a mapping score of 255 (column1). I'm guessing that bowtie 2 has many FP's at a mapping score of 1 (column3 if column1 == 1), but cumulatively finds more TP's with all mapping scores (column2 if column1 == 1). But I was also wondering the exact meaning from the output of the wgsim_eval.pl script.
% tail *.roc
==> bowtie2.roc <==
14 172922 11
13 172925 12
12 177943 27
11 177945 28
10 179990 37
9 179995 40
4 180250 40
3 187273 578
2 187324 580
1 199331 5877
==> bowtie.roc <==
255 86206 1740
==> bwa.roc <==
10 192354 72
9 192560 107
8 192595 107
7 192628 110
6 192652 115
5 192669 116
4 192681 117
3 192731 117
2 192741 118
1 192762 119
Chris
Leave a comment:
-
Hi Brent - that would make sense - varying minimum mapping quality thresholds and seeing the result. It would be nice if those values were also plotted on the graph somehow.
Leave a comment:
-
nickloman, I believe the thing that's changing in the figures for the other mappers is the mapping quality. GSNAP, bowtie and (apparently) soap2 do not calculate the mapping quality so there is nothing to vary to get a line.
Leave a comment:
-
Originally posted by lh3 View PostKnowing the average FNR/FPR is not enough. This is where the ROC curve shows its power. It gives the full spectrum of the accuracy.
Sorry if this is a dumb question!
Leave a comment:
-
Originally posted by jkbonfield View PostI don't particularly wish to get drawn into a mapper war, and I'll say here that I haven't benchmarked these tools to compare. However thinking more downstream I think averaged sensitivity and specificity metrics aren't sufficient to show the whole story.
Originally posted by jkbonfield View PostSo say we have 100mers of a simulated genome with X% of SNPs. We can algorithmically produce 100x depth by starting a new 100mer on every position in the genome, and then give them appropriate real looking quality profiles with error rates from real data, etc. (So as real as can be, but perfectly uniform distribution with known mapping locations.)
Then we can plot the depth distribution. How many sites are there were a particular combination of SNPs or errors has caused a dip in coverage? Given we're almost always looking for very specific locations, often around discrepancies, this is perhaps a key metric in analysis.
Leave a comment:
-
Originally posted by jkbonfield View PostI think most would feel happier with the 90% sensitivity aligner.
Leave a comment:
-
I don't particularly wish to get drawn into a mapper war, and I'll say here that I haven't benchmarked these tools to compare. However thinking more downstream I think averaged sensitivity and specificity metrics aren't sufficient to show the whole story.
I agree with Heng that quality of the mapping score is very important for some forms of analysis. Furthermore I'd go to say the variance of depth is important too. Eg imagine we have two aligners that can map 95% of data and 90% of data each. The one mapping 95% maps well to 95% of the genome and atrociously to 5% of the genome, while the one mapping 90% maps across the entire genome in a relatively uniform manner - I think most would feel happier with the 90% sensitivity aligner.
So say we have 100mers of a simulated genome with X% of SNPs. We can algorithmically produce 100x depth by starting a new 100mer on every position in the genome, and then give them appropriate real looking quality profiles with error rates from real data, etc. (So as real as can be, but perfectly uniform distribution with known mapping locations.)
Then we can plot the depth distribution. How many sites are there were a particular combination of SNPs or errors has caused a dip in coverage? Given we're almost always looking for very specific locations, often around discrepancies, this is perhaps a key metric in analysis.
Leave a comment:
-
For a single mapper, it is true that the more it maps, the higher FPR it has. But when you compare two mappers, it is possible for one mapper to both map more reads and have lower FPR. Then that is the better one.
Leave a comment:
-
So an algorithm that has a high sensitivity is likely to have a low specificity? I don't think these terms mean much outside of a hospital type test. What we want is accuracy.
Leave a comment:
-
Yes, it is my fault to use a wrong term. Sorry for the confusion. To clarify, I mean we want to achieve low false positive rate (this should be right).
Bowtie2 is definitely a substantial improvement over bowtie1 in almost every aspect, and I can really see the encouraging improvement in terms of low FPR between beta2 and beta3, all in the right direction. When you also focus your development on low FPR, probably you will gain further improvement. This will be good for everyone.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
176 views
0 likes
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
||
Started by seqadmin, 02-28-2025, 12:58 PM
|
0 responses
267 views
0 likes
|
Last Post
by seqadmin
02-28-2025, 12:58 PM
|
||
Started by seqadmin, 02-24-2025, 02:48 PM
|
0 responses
652 views
0 likes
|
Last Post
by seqadmin
02-24-2025, 02:48 PM
|
||
Started by seqadmin, 02-21-2025, 02:46 PM
|
0 responses
266 views
0 likes
|
Last Post
by seqadmin
02-21-2025, 02:46 PM
|
Leave a comment: