Hi,
I was given some VCF files created by CASAVA from whole-genome sequencing reads. I am a newbie in handling such polymorphism data extracted from sequencing runs, so I would like to understand some issues regarding filtering and converting the data for downstream association analysis. Please consider that I have one VCF file per each study population sample.
1- should I filter variants based on "QUAL" AND "GQ" fileds? As far as I checked the data, these fields + "DP" (sequencing depth) are the only parameters I have relevant to quality measures of the genotypes.
2- Since in the VCF files there is obviously no REF/REF genotype, Should I assume variants not passing the filter(s) of the previous step as REF/REF or MISSING?
3- How can I use imputation (if I should at all) to empower the calling process?
Thanks for your suggestions in advance,
Sourena
I was given some VCF files created by CASAVA from whole-genome sequencing reads. I am a newbie in handling such polymorphism data extracted from sequencing runs, so I would like to understand some issues regarding filtering and converting the data for downstream association analysis. Please consider that I have one VCF file per each study population sample.
1- should I filter variants based on "QUAL" AND "GQ" fileds? As far as I checked the data, these fields + "DP" (sequencing depth) are the only parameters I have relevant to quality measures of the genotypes.
2- Since in the VCF files there is obviously no REF/REF genotype, Should I assume variants not passing the filter(s) of the previous step as REF/REF or MISSING?
3- How can I use imputation (if I should at all) to empower the calling process?
Thanks for your suggestions in advance,
Sourena