Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gibberwocky
    replied
    On reflection, the Beagle imputation works on haplotypes so that can explain the discordance between the genotpye probabilities and genotype called at an individual SNP.

    I think the simplest solution might be just to impute missing genotypes with the most common allele for each SNP in a given population...

    Leave a comment:


  • gibberwocky
    replied
    Hi

    Sorry for not being clear. All of the SNPs fed into Beagle have been filtered to ensure that they only include homozygous SNPs. So the resulting heterozygosity has been introduced during imputation by Beagle, and there's probably a perfectly reasonable statistical explanation for this - for instance the genotype probabilities in the following SNP are 0.112,0.444,0.444 and Beagle assigns the genotype as 0|1 to some individuals but 1|1 to others even though the probabilities are the same.

    Code:
    GroupUn1430	532	.	G	A	.	PASS	AR2=0;DR2=0.03;AF=0.677	GT:DS:GP	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:2:0,0,1
    My thread is more of a general request for a discussion on processing VCF files from haploid individuals, for downstream population genetics analyses. One thought was to use just the callable portion of the genome in all individuals, however this reduces the SNPs across all samples from ~ 4 M to < 200 K. And as demonstrated above, imputation may not be a reliable solution.

    I guess I'm looking for some reliable instruction on merging VCF files from haploid individuals, imputing missing genotypes where possible.

    Leave a comment:


  • JackieBadger
    replied
    What exactly is your question?


    Perhaps "Why are my genotypes showing to have some heterozygosity and why are replicates not identical?"

    The answer would simply be artifacts in your data,either from lab protocols, contamination, or sequencing. 25% seems high, but if you are 110% that these genomes are haploid then what is stoping you from throwing out the minor allele (i.e. error) in the heterozygous individuals?

    Another possibility is that you have duplicate regions aligning to the same part of your reference. Reference genomes are notorious is missing copy number variation or closely related paralogues, and the only way (as far as Im aware) to detect them is though various programs that apply certain depth/SNP algorithms to determine if its likely your genotypes contain errors or CNV. So in a nutshell, you may have particular genes that have been duplicated along the same chromosome. Something to think about.

    Leave a comment:


  • gibberwocky
    started a topic Haploid inspiration

    Haploid inspiration

    Hi all

    I'm looking for some inspiration/guidance on processing NGS data from haploid individuals. In a nutshell, we've sequenced at ~ 6X coverage a number of individuals (~30) from a single population. Following mapping, processing, and SNP calling, each individual has ~ 600K SNPs.

    At this depth of coverage, ~ 70% of the genome is callable for each individual, and it stands to reason that individuals will not have identical genome coverage. And so in order to generate a single VCF for the population, without sacrificing an enormous number of SNPs, it would make sense to impute missing genotypes. To this end I have experimented with Beagle (v4) on a single chromosome, and it appeared to do the job - although upon closer examination the results indicated ~ 25% heteroygosity in each haploid individual. In addition, all samples were on average 85% idential by state, including 2 individuals which had been sequenced twice and should present a reliable control for the methodology.

    Is anyone with experience in processing NGS data for haploids able to offer any insights/suggestions?

    D

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:45 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:59 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-09-2024, 08:22 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
175 views
0 likes
Last Post seqadmin  
Working...
X