Regarding ONT's new technology, I think I heard the following over the last few days (please correct if I've mis-heard):
4% of what? Given an error model including deletions, one possible interpretation is that there is a 4% chance that a given base will be missed. ONT, is that what you mean?
More to the point: What observable aspects of the process are used in computing quality scores? We've been shown (sort of) how current traces can be converted into basecalls. What residue of that information is used in generating confidence estimates? If we knew that, we could better understand what use we can make of the scores.
In any case, a fastq format doesn't seem appropriate. This isn't the Illumina error model, where you want to quantify by Q score the likelihood that a basecall at a given position is wrong. If I see a score of Q14 at some position in an ONT fastq file, does that mean there's a 4% chance that (1) the base was called wrong, (2) a base was missed following, or (3) prior to this position? How do I encode the case of miscall-plus-two-deletions at a given position?
--TS
- Error rate is 4% today, targeting 1% by release.
- Predominant error is a deletion.
- Output (e.g., from MinION) is in fastq format.
4% of what? Given an error model including deletions, one possible interpretation is that there is a 4% chance that a given base will be missed. ONT, is that what you mean?
More to the point: What observable aspects of the process are used in computing quality scores? We've been shown (sort of) how current traces can be converted into basecalls. What residue of that information is used in generating confidence estimates? If we knew that, we could better understand what use we can make of the scores.
In any case, a fastq format doesn't seem appropriate. This isn't the Illumina error model, where you want to quantify by Q score the likelihood that a basecall at a given position is wrong. If I see a score of Q14 at some position in an ONT fastq file, does that mean there's a 4% chance that (1) the base was called wrong, (2) a base was missed following, or (3) prior to this position? How do I encode the case of miscall-plus-two-deletions at a given position?
--TS