I have VCF files generated from GATK's HaplotypeCaller. One file per each of 20 individuals. These VCF files will be combined into a multi-sample gVCF for joint genotyping using GenotypeGVCFs (GATK), producing a vcf.gz file including all variable positions across the individuals.
I would like to set a filter to remove certain variants. The tricky part is that this filter is not a global filter, meaning, the filtering threshold should be set differently for each individual. Specifically, I'm looking to exclude any genotype (variant) call *within an individual* that has more than 4-times the average read depth of *that individual*.
How do I achieve such filtering? Can this be done on the combined vcf-file (or even the variants vcf file), or do I have to do such filtering before combining individual VCF files into one?
And, how do I implement this filter? I cannot think of any tool that allows me to filter out positions with a too high read depth, and particularly not if the respective threshold depends on the genome-wide average.
Thank you for your help!
I would like to set a filter to remove certain variants. The tricky part is that this filter is not a global filter, meaning, the filtering threshold should be set differently for each individual. Specifically, I'm looking to exclude any genotype (variant) call *within an individual* that has more than 4-times the average read depth of *that individual*.
How do I achieve such filtering? Can this be done on the combined vcf-file (or even the variants vcf file), or do I have to do such filtering before combining individual VCF files into one?
And, how do I implement this filter? I cannot think of any tool that allows me to filter out positions with a too high read depth, and particularly not if the respective threshold depends on the genome-wide average.
Thank you for your help!
Comment