Dear all.
I have millions of SNPs detected by sequencing.
I did a filtration with different criteria and I have a good set of SNPs. I need to submit all the filtered SNPs to NCBI database (dbSNP), but they asked me to check some problematic regions, for example regions with high density of SNPs = >10 SNPs in 50 bases.
I have ~2,000 of those regions with high density SNPs. I have checked manually some of them, and some look ok, but some have low quality indels, repeated regions, homopolymers, etc.
Is it a good idea to remove these high density regions of SNPs? It is impossible to check region by region manually...
Some authors say that "Clustering of SNPs can be a result of the mis-alignment of reads because of the presence of the indels (insertions or deletions) at the beginning or end of reads" (http://www.biomedcentral.com/1471-2164/15/307)
But maybe these regions can be important to be removed....
thanks a lot
Clarissa
I have millions of SNPs detected by sequencing.
I did a filtration with different criteria and I have a good set of SNPs. I need to submit all the filtered SNPs to NCBI database (dbSNP), but they asked me to check some problematic regions, for example regions with high density of SNPs = >10 SNPs in 50 bases.
I have ~2,000 of those regions with high density SNPs. I have checked manually some of them, and some look ok, but some have low quality indels, repeated regions, homopolymers, etc.
Is it a good idea to remove these high density regions of SNPs? It is impossible to check region by region manually...
Some authors say that "Clustering of SNPs can be a result of the mis-alignment of reads because of the presence of the indels (insertions or deletions) at the beginning or end of reads" (http://www.biomedcentral.com/1471-2164/15/307)
But maybe these regions can be important to be removed....

thanks a lot
Clarissa
Comment