Hello,
I used GATK to call variants.
I want to filter the vcf file based on genotype quality.
If a genotype has a GQ<20, I would like the corresponding genotype to be replaced by missing data (./.).
I used GATK to filter :
And I obtain a vcf where the genotypes fields look like that :
If GQ>20, then :
The "PASS" appears.
But if GQ<20, then :
It's written LowGQ.
The thing is that for almost all SNPs, I have a LowGQ in one or more individuals so I can't filter out the SNP using "grep -v LowGQ".
I would prefer the corresponding genotype to be replaced by "./." (missing).
Is there a tool to do that or I have to write a bash script ?
Thanks a lot for your help,
Muriel
I used GATK to call variants.
I want to filter the vcf file based on genotype quality.
If a genotype has a GQ<20, I would like the corresponding genotype to be replaced by missing data (./.).
I used GATK to filter :
Code:
java -Xmx20g -jar GenomeAnalysisTK.jar \ -T VariantFiltration \ -R ../ref.fasta \ -V vcf_snp_f1.vcf \ -o vcf_snp_f11.vcf \ --genotypeFilterExpression "GQ <= 20" \ --genotypeFilterName "LowGQ" \
If GQ>20, then :
Code:
0/1:2,3:5:PASS:44:95,0,44
But if GQ<20, then :
Code:
0/0:2,0:2:LowGQ:6:0,6,68
The thing is that for almost all SNPs, I have a LowGQ in one or more individuals so I can't filter out the SNP using "grep -v LowGQ".
I would prefer the corresponding genotype to be replaced by "./." (missing).
Is there a tool to do that or I have to write a bash script ?
Thanks a lot for your help,
Muriel
Comment