Hi,
I have variant (SNP) calls made from the GATK (vcf format). I have a tumor data-sets (No normal data-set). I have filtered out the SNPs that are present in the public domain such as dbSNP as I am interested in looking at the novel SNPs in the tumor data-set. I am also excluding anything that is present in COSMIC as well.
In an attempt to gain more confidence in the novel SNP calls I plan to filter them based on SNP quality controls such as read depth, mapping quality, Fisher p-value, Phred score, or variant confidence/quality by depth. I am afraid of selecting arbitrary thresholds for these variables. So my questions are:
1) What is the best approach to SNP QC filtering?
2) What variables (among the few I mentioned) are the most relevant to determine a SNP quality? how should I set these thresholds?
I would really appreciate your insights and feedback on this. Thank you!
Regards,
BhariD
I have variant (SNP) calls made from the GATK (vcf format). I have a tumor data-sets (No normal data-set). I have filtered out the SNPs that are present in the public domain such as dbSNP as I am interested in looking at the novel SNPs in the tumor data-set. I am also excluding anything that is present in COSMIC as well.
In an attempt to gain more confidence in the novel SNP calls I plan to filter them based on SNP quality controls such as read depth, mapping quality, Fisher p-value, Phred score, or variant confidence/quality by depth. I am afraid of selecting arbitrary thresholds for these variables. So my questions are:
1) What is the best approach to SNP QC filtering?
2) What variables (among the few I mentioned) are the most relevant to determine a SNP quality? how should I set these thresholds?
I would really appreciate your insights and feedback on this. Thank you!
Regards,
BhariD
Comment