'lo everyone,
been working with entirely too many different quality scores in the last few weeks (Sanger/Solexa/Illumina) and trying to get a bit of a handle on best practices. For a normal re-sequencing project there seem to be a fair amount of steps where filtering can occur:
* on the sequence level (number of Ns in the read, polynucleotide sequences etc)
* on the sequence quality level (minimum average FASTQ score for a given read)
* the alignment level (quality of the alignment for a read)
* and finally on the SNP call level (confidence in the call -- and I've yet to understand the difference between consensus quality, SNP quality and RMS mapping quality in SAMTools)
Details are of course going to be project dependent, and I can come up with rough filter values by converting the score to a Phred probability and deciding on the false positives I'm willing to accept, but are there rough guidelines for any of these?
For example, so far we've been using a lower boundary of 15 for the alignment quality (as provided by the SAM/Pileup format), but that is more or less empirical. I haven't been able to find a discussion or review on these topics, but probably just missed them.
Cheers, Oliver
been working with entirely too many different quality scores in the last few weeks (Sanger/Solexa/Illumina) and trying to get a bit of a handle on best practices. For a normal re-sequencing project there seem to be a fair amount of steps where filtering can occur:
* on the sequence level (number of Ns in the read, polynucleotide sequences etc)
* on the sequence quality level (minimum average FASTQ score for a given read)
* the alignment level (quality of the alignment for a read)
* and finally on the SNP call level (confidence in the call -- and I've yet to understand the difference between consensus quality, SNP quality and RMS mapping quality in SAMTools)
Details are of course going to be project dependent, and I can come up with rough filter values by converting the score to a Phred probability and deciding on the false positives I'm willing to accept, but are there rough guidelines for any of these?
For example, so far we've been using a lower boundary of 15 for the alignment quality (as provided by the SAM/Pileup format), but that is more or less empirical. I haven't been able to find a discussion or review on these topics, but probably just missed them.
Cheers, Oliver
Comment