I'm just starting looking at MiSeq data from pooled resequencing samples (specific gene of parasitic field isolates) and am looking at the frequency of specific rare variants in the gene. Using "GATK -T DepthOfCoverage", I've gotten base counts at the positions I care about and then calculated frequencies.
From published testing, it seems that the MiSeq has an error rate of 0.1/100bp, which is right at the frequency I am seeing for my variants (e.g. 3 out of 3155 reads). When generating the reads, if I filter to "minBaseQuality 30", I lose the reads with a variant for many of the samples. Since I'm right at the edge of the error rate, what calculation should I use to determine whether or not to trust the frequencies that I am seeing?
From published testing, it seems that the MiSeq has an error rate of 0.1/100bp, which is right at the frequency I am seeing for my variants (e.g. 3 out of 3155 reads). When generating the reads, if I filter to "minBaseQuality 30", I lose the reads with a variant for many of the samples. Since I'm right at the edge of the error rate, what calculation should I use to determine whether or not to trust the frequencies that I am seeing?
Comment