I have whole genome sequencing data from different sources. And the sequencing depth varies greatly among different samples.And I joint-called SNPs with all samples.
When I do SNP filtration after SNP calling, I found that the genotype quality is influenced by sequencing depth.
For example, I set GQ >= 30 as my threshold, a genotype that does not meet this threshold is considered a missing genotype. Then after max-missing filtration, I found only very few SNPs are retained, which can not meet the requirements of subsequent analysis.Even if I lowered the threshold to 20, SNPs left are still inadequate.
So I calculated the proportion of variants that meet the threshold in each individual. I found that the higher the sequencing depth of the individual, the higher the proportion of variants that met the threshold .
How can I fix this problem? Can I just skip the GQ filteration? Does that affect the analysis later on, like population structure analysis, demographic analysis or selective sweep indentification?
By the way, the species I study is not a model species, so VQSR in GATK is not usable. And I already did hard-filtring before GQ filtration.
When I do SNP filtration after SNP calling, I found that the genotype quality is influenced by sequencing depth.
For example, I set GQ >= 30 as my threshold, a genotype that does not meet this threshold is considered a missing genotype. Then after max-missing filtration, I found only very few SNPs are retained, which can not meet the requirements of subsequent analysis.Even if I lowered the threshold to 20, SNPs left are still inadequate.
So I calculated the proportion of variants that meet the threshold in each individual. I found that the higher the sequencing depth of the individual, the higher the proportion of variants that met the threshold .
How can I fix this problem? Can I just skip the GQ filteration? Does that affect the analysis later on, like population structure analysis, demographic analysis or selective sweep indentification?
By the way, the species I study is not a model species, so VQSR in GATK is not usable. And I already did hard-filtring before GQ filtration.