I have a VCF file that has dosage r^2 in the info field. The problem is that while the r^2 value should be in the 0 to 1 range, it has both negative values and values above 1.
Is there a fundamental problem with my data? I might add that this is whole-exome data where the off-target regions have been imputed using Beagle.
I have pasted some data collected from VCFtools, just to give an example. As you can see there are huge numbers (positive and negative), and a lot of zeros.
Dosage r^2 example:
CHROM POS REF ALT DR2
1 10177 A AC 0
1 10235 T TA 0
1 10352 T TA 0
1 10616 CCGCCGTTGCAAAGGCGCGCCG C 0.01
1 10642 G A 0
1 11008 C G 0.01
1 11012 C G 0.01
1 11063 T G 0
More dosage r^2 examples:
One allele with dr2=0, one with a huge number:
1 66381 TATATA AATATA,T 0,5.10663e+28
One with high correlation, another with a huge (negative) number:
1 769829 C A,G 0.82,-7.97911e+26
Also really tiny numbers, which is plausible, but suspicious:
1 15274 A G,T 0,3.66383e-14
Is there a fundamental problem with my data? I might add that this is whole-exome data where the off-target regions have been imputed using Beagle.
I have pasted some data collected from VCFtools, just to give an example. As you can see there are huge numbers (positive and negative), and a lot of zeros.
Dosage r^2 example:
CHROM POS REF ALT DR2
1 10177 A AC 0
1 10235 T TA 0
1 10352 T TA 0
1 10616 CCGCCGTTGCAAAGGCGCGCCG C 0.01
1 10642 G A 0
1 11008 C G 0.01
1 11012 C G 0.01
1 11063 T G 0
More dosage r^2 examples:
One allele with dr2=0, one with a huge number:
1 66381 TATATA AATATA,T 0,5.10663e+28
One with high correlation, another with a huge (negative) number:
1 769829 C A,G 0.82,-7.97911e+26
Also really tiny numbers, which is plausible, but suspicious:
1 15274 A G,T 0,3.66383e-14