Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Predicting true SNPs from .vcf file

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicting true SNPs from .vcf file

    The .vcf file contains so many different measurements, and a couple of different quality scores, does anyone have a good empirical idea of which values are the best guidelines to picking out real SNPs from noise and artifacts? I've done about a hundred sanger sequencing reactions on a variety of predicted SNPs at a variety of quality levels, but the picture still isn't very clear.

    For instance these SNPs looked real in the sanger:

    GT:PL:GQ 0/1:20,11,149:13 gcacacacacacacacacacacacacacacacacacacacac gcacacacacacacacacacacacacacacacacacacac 211 DP=95 DP4=17,0,10,2 MQ=48 FQ=214

    GT:PL:GQ 1/1:30,3,0:39 GAA GA 147 DP=10 DP4=0,0,1,6 MQ=48 FQ=-37.3


    These did not confirm with sanger sequencing

    GT:PL:GQ 0/1:24,0,114:27 A C 110 DP=130 DP4=14,35,2,22 MQ=45 FQ=113

    GT:PL:GQ 0/1:40,0,47:42 T G 98.3 DP=71 DP4=17,1,14,0 MQ=44 FQ=101

    Those don't look notably worse than the ones above them, so I'm not sure what I should have looked at to predict that the bottom two were false positives.

    (My a priori assumption was that these variants were all real, because I made a multi-vcf with mpileup with this samples and many sibling animals, and these variants were common to all the animals)

  • #2
    maybe the depth of the bottom two is too high. Have you try the -D options?

    Originally posted by swbarnes2 View Post
    The .vcf file contains so many different measurements, and a couple of different quality scores, does anyone have a good empirical idea of which values are the best guidelines to picking out real SNPs from noise and artifacts? I've done about a hundred sanger sequencing reactions on a variety of predicted SNPs at a variety of quality levels, but the picture still isn't very clear.

    For instance these SNPs looked real in the sanger:

    GT:PL:GQ 0/1:20,11,149:13 gcacacacacacacacacacacacacacacacacacacacac gcacacacacacacacacacacacacacacacacacacac 211 DP=95 DP4=17,0,10,2 MQ=48 FQ=214

    GT:PL:GQ 1/1:30,3,0:39 GAA GA 147 DP=10 DP4=0,0,1,6 MQ=48 FQ=-37.3


    These did not confirm with sanger sequencing

    GT:PL:GQ 0/1:24,0,114:27 A C 110 DP=130 DP4=14,35,2,22 MQ=45 FQ=113

    GT:PL:GQ 0/1:40,0,47:42 T G 98.3 DP=71 DP4=17,1,14,0 MQ=44 FQ=101

    Those don't look notably worse than the ones above them, so I'm not sure what I should have looked at to predict that the bottom two were false positives.

    (My a priori assumption was that these variants were all real, because I made a multi-vcf with mpileup with this samples and many sibling animals, and these variants were common to all the animals)

    Comment

    Working...
    X