I am doing a project with focusing on identifying rare variants (say observed allele frequency <1%). We sequenced 1 gene with pooling from 500 patients (1 pool) using Solexa.
I am interested in one site which have reads for
A ,C,G(reference allele),T are 6,106,10727,1. I got
an pvalue 5.23x10-20 based on our in-house algorithm for the site C/G. The observed minor allele (C) frequency of the site is 0.00969.
Given a sequencing error, say 1%, and total population alleles (1000 from 500 individuals), how can I know whether the reads, say C (read count 106), are due to sequencing error or not? In other words, can we make a cutoff of read count for alternative allele (non-reference allele with maximum read count, here is C), if the read count of the alternative allele at a given site is larger than this cutoff, we can make sure this is not due to sequencing error???
Thanks,
I am interested in one site which have reads for
A ,C,G(reference allele),T are 6,106,10727,1. I got
an pvalue 5.23x10-20 based on our in-house algorithm for the site C/G. The observed minor allele (C) frequency of the site is 0.00969.
Given a sequencing error, say 1%, and total population alleles (1000 from 500 individuals), how can I know whether the reads, say C (read count 106), are due to sequencing error or not? In other words, can we make a cutoff of read count for alternative allele (non-reference allele with maximum read count, here is C), if the read count of the alternative allele at a given site is larger than this cutoff, we can make sure this is not due to sequencing error???
Thanks,