Given a vcf file, is there a feature in SNP analysis package to filter out known SNPs (SNPs with rs# associated with)? If so, how is it done? Is it simply the concordance between the variant call in the vcf file to the known SNP in the dbSNP 137 build by chromosome#, position, reference, and alternate allele?
I want to filter out any known and common SNPs (SNPs associated with rs#) present in dbSNP or Thousand Genomes from my vcf file to get to the novel SNPs. Is it simply based on the concordance between the variant call in the vcf file to the known SNP in the dbSNP 137 build by chromosome#, position, reference, and alternate allele? or is anything different?
If I do it this way, I am noticing discrepancies in the tools out there that provide any dbSNP validation versus the SNPs listed in the dbSNP 137 build file itself (downloaded from ncbi ftp). For example, for a given variant, I do not find a SNP in the dbSNP 137 file, however, I end up finding an rs# associated with the same variant from a different tool such as Polyphen-2 or SeattleSeq while I am searching for something else like annotation or protein prediction.
How can I be sure that the variant identified doesn't have an rs# or is not a known SNP? Is there a one way or do I have to exhaustively check out every tool to validate it?
Any help, comment, or feedback will be greatly appreciated!
Thanks,
BD
I want to filter out any known and common SNPs (SNPs associated with rs#) present in dbSNP or Thousand Genomes from my vcf file to get to the novel SNPs. Is it simply based on the concordance between the variant call in the vcf file to the known SNP in the dbSNP 137 build by chromosome#, position, reference, and alternate allele? or is anything different?
If I do it this way, I am noticing discrepancies in the tools out there that provide any dbSNP validation versus the SNPs listed in the dbSNP 137 build file itself (downloaded from ncbi ftp). For example, for a given variant, I do not find a SNP in the dbSNP 137 file, however, I end up finding an rs# associated with the same variant from a different tool such as Polyphen-2 or SeattleSeq while I am searching for something else like annotation or protein prediction.
How can I be sure that the variant identified doesn't have an rs# or is not a known SNP? Is there a one way or do I have to exhaustively check out every tool to validate it?
Any help, comment, or feedback will be greatly appreciated!
Thanks,
BD