Hi, I've been having a little trouble (and confusion) trying to get the right set of calls out of the GATK's UnifiedGenotyper; I'm doing analysis of of known and novel SNPs, and I'd like to get calls for both:
- observed variants across my 50 samples (including novel ones)
- known variant sites (even though all samples may be homozygous reference) for which there is sufficient read coverage, etc.
The no-call vs homozygous reference difference is a really a big deal for my analysis. When I run with -out_mode EMIT_VARIANTS_ONLY, I only get calls for sites where my samples vary, and when I run with -out_mode EMIT_ALL_CONFIDENT_SITES, I get a huge .vcf file with tons of calls everywhere I had decent coverage. Is there a way to tell UnifiedGenotyper to call specific sites, e.g. the ones listed in the --dbsnp rod file? Or do I need to run with EMIT_ALL_CONFIDENT_SITES, and filter by hand (which I'd like to avoid)?
- observed variants across my 50 samples (including novel ones)
- known variant sites (even though all samples may be homozygous reference) for which there is sufficient read coverage, etc.
The no-call vs homozygous reference difference is a really a big deal for my analysis. When I run with -out_mode EMIT_VARIANTS_ONLY, I only get calls for sites where my samples vary, and when I run with -out_mode EMIT_ALL_CONFIDENT_SITES, I get a huge .vcf file with tons of calls everywhere I had decent coverage. Is there a way to tell UnifiedGenotyper to call specific sites, e.g. the ones listed in the --dbsnp rod file? Or do I need to run with EMIT_ALL_CONFIDENT_SITES, and filter by hand (which I'd like to avoid)?
Comment