I'm working with metagenomic data, and am considering using the unified genotyper from GATK to call snps. A common problem I'm having with most variant calling tools I've tried is that they are trying to assign a diploid genotype to my data, but I really want to just consider allele frequencies directly at snp loci.
It looks like I can use the 'AD' values in the sample stats columns to get the actual number of occurences of the REF and ALT alleles, but I see that they do not always add up to the 'DP' value (the depth at the loci). Is the difference there due to the occurences of additional alleles beyond the REF and the ALT?
Also, will there be some bias in detected loci towards sites with allele counts that fit diploid genotypes frequencies? I.e. loci where its either 100% REF, 100% ALT, or about 50/50 REF/ALT. Since in my metagenomic samples, I'm really looking at a non-clonal populations of bacteria, so I would think its possible that I might see 75% REF & 25% ALT allele (ALT == next most common allele). Do you think the unified genotype would score that loci with a lower confidence (or possibly not report) due to it not looking like a typical diploid genotype?
It looks like I can use the 'AD' values in the sample stats columns to get the actual number of occurences of the REF and ALT alleles, but I see that they do not always add up to the 'DP' value (the depth at the loci). Is the difference there due to the occurences of additional alleles beyond the REF and the ALT?
Also, will there be some bias in detected loci towards sites with allele counts that fit diploid genotypes frequencies? I.e. loci where its either 100% REF, 100% ALT, or about 50/50 REF/ALT. Since in my metagenomic samples, I'm really looking at a non-clonal populations of bacteria, so I would think its possible that I might see 75% REF & 25% ALT allele (ALT == next most common allele). Do you think the unified genotype would score that loci with a lower confidence (or possibly not report) due to it not looking like a typical diploid genotype?