Hi everybody !
I try to analyse my SNP calling results given by samtools and bcftools for my Illumina paired-end data.
My first problem was identify the order in list of Phred-scaled genotype likelihoods (PL). In a old post, they say :
"(...) A T,G (...) DP=35;AF1=1;CI95=0.5,1;DP4=0,0,1,8;MQ=60 PL:GT:GQ 96,47,70,35,0,70:1/1:72
ref = A, alt = T,G right?
So, genotype may be AA,AT,AG,TT,TG,GG"
http://seqanswers.com/forums/showthread.php?t=9345&highlight=genotype+vcf
but when I look my results in a genome browser, I find AA,AT,TT,AG,TG,GG for this example instead of AA,AT,AG,TT,TG,GG, unless I'm very much mistaken...
So, if it's right, I don't understand how samtools/bcftools calculate individual genotype when there are two alternates alleles.. GT and PL fields are in conflict when they allocate a genotype. I give one example :
So genotype may be TT, TG, GG, TC, GC, CC
sample 1 = 1/1:173,18,0,173,18,173:6:17 => this sample is GG...
sample 2 = 0/1:81,81,81,9,9,0:3:3 => this sample appears CC with the PL, and TG or TC with the GT... and it is GC in my viewer !!
sample 3 = 0/0:0,24,182,24,182,182:9:19 => this sample is TT
sample 4 = 0/1:0,3,26,3,26,26:1:4 => this sample appears TT in the PL and the genome browser, so homozygous for the reference allele, but the GT indicate a heterozygous genotype !
sample 5 = 0/1:124,124,124,18,18,0:6:3 => this sample is CC
I dont understand.. if somebody can help me, it would be fabulous !
Thanks,
Rachel
I try to analyse my SNP calling results given by samtools and bcftools for my Illumina paired-end data.
My first problem was identify the order in list of Phred-scaled genotype likelihoods (PL). In a old post, they say :
"(...) A T,G (...) DP=35;AF1=1;CI95=0.5,1;DP4=0,0,1,8;MQ=60 PL:GT:GQ 96,47,70,35,0,70:1/1:72
ref = A, alt = T,G right?
So, genotype may be AA,AT,AG,TT,TG,GG"
http://seqanswers.com/forums/showthread.php?t=9345&highlight=genotype+vcf
but when I look my results in a genome browser, I find AA,AT,TT,AG,TG,GG for this example instead of AA,AT,AG,TT,TG,GG, unless I'm very much mistaken...
So, if it's right, I don't understand how samtools/bcftools calculate individual genotype when there are two alternates alleles.. GT and PL fields are in conflict when they allocate a genotype. I give one example :
Code:
chrX 3506519 T G,C DP=114;VDB=0.0324;AF1=0.6132;G3=0.4281,3.194e-06,0.5719;HWE=0.00569;AC1=12;DP4=9,15,43,17;MQ=60;FQ=999;PV4=0.0058,1,1,0.39 1/1:173,18,0,173,18,173:6:17 0/1:81,81,81,9,9,0:3:3 0/0:0,24,182,24,182,182:9:19 0/1:0,3,26,3,26,26:1:4 0/1:124,124,124,18,18,0:6:3
So genotype may be TT, TG, GG, TC, GC, CC
sample 1 = 1/1:173,18,0,173,18,173:6:17 => this sample is GG...
sample 2 = 0/1:81,81,81,9,9,0:3:3 => this sample appears CC with the PL, and TG or TC with the GT... and it is GC in my viewer !!
sample 3 = 0/0:0,24,182,24,182,182:9:19 => this sample is TT
sample 4 = 0/1:0,3,26,3,26,26:1:4 => this sample appears TT in the PL and the genome browser, so homozygous for the reference allele, but the GT indicate a heterozygous genotype !
sample 5 = 0/1:124,124,124,18,18,0:6:3 => this sample is CC
I dont understand.. if somebody can help me, it would be fabulous !
Thanks,
Rachel
Comment