Hi Colin
OK, thanks for bringing that to my attention.
Yes, Phred Q1 translates to just over 20% probability of the called base being correct. In the absence of further information, the natural assumption is that the three non-called bases are equiprobable, but that then means for a Q1 base the three non-called bases are each more likely than the called base - this can mess up your stats! It probably doesn't matter so much what the Q-value of an 'N' is set to, but I guess they are being set to Q2 for consistency.
Personally I've tended to find that if the error probability is higher enough for the divergence of the scoring schemes to be an issue then the base is probably best ignored for many purposes.
There are certainly plusses and minuses to both scoring schemes. The original reason for going with the 'Solexa' log-odds scheme was that, unlike the Phred scheme, it naturally extends to a 4-values-per-base scoring scheme. We've ended up using only a single value per base, but I know some folks in the community remain keen on having more than one qv per base.
Cheers
Tony
Originally posted by sparks
View Post
Originally posted by sparks
View Post
Personally I've tended to find that if the error probability is higher enough for the divergence of the scoring schemes to be an issue then the base is probably best ignored for many purposes.
There are certainly plusses and minuses to both scoring schemes. The original reason for going with the 'Solexa' log-odds scheme was that, unlike the Phred scheme, it naturally extends to a 4-values-per-base scoring scheme. We've ended up using only a single value per base, but I know some folks in the community remain keen on having more than one qv per base.
Cheers
Tony
Comment