Hi there,
we have made a panel that comprises several (190) exonic alterations and now we want to test how capable are we to call these variants starting from a DNA sample. This post is for retrieve suggestions about how can we do this and/or to know if my approach is not correct.
Basically what I have been thinking about is purchasing NA12878 DNA from Coriell Institute, sequencing it and running a variant calling pipeline (without going into details, mapping vs GRCh37 and call variants with mpileup). Once I have the variants, check them vs the high-confidence calls set focusing on my 190 exonic alterations and if the alleles matches at these positions would mean that we call these variants correctly.
Is this a correct way?
Previously, aiming to evaluate just the bioinformatic pipeline and not the laboratory part, I did the same analysis with the GIAB NA12878 dataset (HiSeq x300) and checked the output variants with this callset ftp://ftp.ncbi.nlm.nih.gov/giab/ftp/...ransfer.vcf.gz available at the GIAB ftp site. Previously I normalized the variants representation of both calls sets using *vcflib vcfallelicprimitives*.
And a last question. For my validation purpose, is there any difference between purchasing the NIST reference DNA and purchasing the Coriell Institute DNA Sample? I think it's the same, but look the difference in prices...
Any suggestion will be welcome
we have made a panel that comprises several (190) exonic alterations and now we want to test how capable are we to call these variants starting from a DNA sample. This post is for retrieve suggestions about how can we do this and/or to know if my approach is not correct.
Basically what I have been thinking about is purchasing NA12878 DNA from Coriell Institute, sequencing it and running a variant calling pipeline (without going into details, mapping vs GRCh37 and call variants with mpileup). Once I have the variants, check them vs the high-confidence calls set focusing on my 190 exonic alterations and if the alleles matches at these positions would mean that we call these variants correctly.
Is this a correct way?
Previously, aiming to evaluate just the bioinformatic pipeline and not the laboratory part, I did the same analysis with the GIAB NA12878 dataset (HiSeq x300) and checked the output variants with this callset ftp://ftp.ncbi.nlm.nih.gov/giab/ftp/...ransfer.vcf.gz available at the GIAB ftp site. Previously I normalized the variants representation of both calls sets using *vcflib vcfallelicprimitives*.
And a last question. For my validation purpose, is there any difference between purchasing the NIST reference DNA and purchasing the Coriell Institute DNA Sample? I think it's the same, but look the difference in prices...
Any suggestion will be welcome