On reflection, the Beagle imputation works on haplotypes so that can explain the discordance between the genotpye probabilities and genotype called at an individual SNP.
I think the simplest solution might be just to impute missing genotypes with the most common allele for each SNP in a given population...
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi
Sorry for not being clear. All of the SNPs fed into Beagle have been filtered to ensure that they only include homozygous SNPs. So the resulting heterozygosity has been introduced during imputation by Beagle, and there's probably a perfectly reasonable statistical explanation for this - for instance the genotype probabilities in the following SNP are 0.112,0.444,0.444 and Beagle assigns the genotype as 0|1 to some individuals but 1|1 to others even though the probabilities are the same.
Code:GroupUn1430 532 . G A . PASS AR2=0;DR2=0.03;AF=0.677 GT:DS:GP 1|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|0:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 1|0:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|0:1.333:0.112,0.444,0.444 1|0:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 1|0:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|1:1.333:0.112,0.444,0.444 0|1:1.333:0.112,0.444,0.444 1|1:2:0,0,1
I guess I'm looking for some reliable instruction on merging VCF files from haploid individuals, imputing missing genotypes where possible.
Leave a comment:
-
What exactly is your question?
Perhaps "Why are my genotypes showing to have some heterozygosity and why are replicates not identical?"
The answer would simply be artifacts in your data,either from lab protocols, contamination, or sequencing. 25% seems high, but if you are 110% that these genomes are haploid then what is stoping you from throwing out the minor allele (i.e. error) in the heterozygous individuals?
Another possibility is that you have duplicate regions aligning to the same part of your reference. Reference genomes are notorious is missing copy number variation or closely related paralogues, and the only way (as far as Im aware) to detect them is though various programs that apply certain depth/SNP algorithms to determine if its likely your genotypes contain errors or CNV. So in a nutshell, you may have particular genes that have been duplicated along the same chromosome. Something to think about.
Leave a comment:
-
Haploid inspiration
Hi all
I'm looking for some inspiration/guidance on processing NGS data from haploid individuals. In a nutshell, we've sequenced at ~ 6X coverage a number of individuals (~30) from a single population. Following mapping, processing, and SNP calling, each individual has ~ 600K SNPs.
At this depth of coverage, ~ 70% of the genome is callable for each individual, and it stands to reason that individuals will not have identical genome coverage. And so in order to generate a single VCF for the population, without sacrificing an enormous number of SNPs, it would make sense to impute missing genotypes. To this end I have experimented with Beagle (v4) on a single chromosome, and it appeared to do the job - although upon closer examination the results indicated ~ 25% heteroygosity in each haploid individual. In addition, all samples were on average 85% idential by state, including 2 individuals which had been sequenced twice and should present a reliable control for the methodology.
Is anyone with experience in processing NGS data for haploids able to offer any insights/suggestions?
D
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:45 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Today, 07:45 AM
|
||
Started by seqadmin, Yesterday, 07:59 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, 12-09-2024, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
12-09-2024, 08:22 AM
|
||
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
175 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
Leave a comment: