Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Haploid inspiration

    Hi all

    I'm looking for some inspiration/guidance on processing NGS data from haploid individuals. In a nutshell, we've sequenced at ~ 6X coverage a number of individuals (~30) from a single population. Following mapping, processing, and SNP calling, each individual has ~ 600K SNPs.

    At this depth of coverage, ~ 70% of the genome is callable for each individual, and it stands to reason that individuals will not have identical genome coverage. And so in order to generate a single VCF for the population, without sacrificing an enormous number of SNPs, it would make sense to impute missing genotypes. To this end I have experimented with Beagle (v4) on a single chromosome, and it appeared to do the job - although upon closer examination the results indicated ~ 25% heteroygosity in each haploid individual. In addition, all samples were on average 85% idential by state, including 2 individuals which had been sequenced twice and should present a reliable control for the methodology.

    Is anyone with experience in processing NGS data for haploids able to offer any insights/suggestions?

    D

  • #2
    What exactly is your question?


    Perhaps "Why are my genotypes showing to have some heterozygosity and why are replicates not identical?"

    The answer would simply be artifacts in your data,either from lab protocols, contamination, or sequencing. 25% seems high, but if you are 110% that these genomes are haploid then what is stoping you from throwing out the minor allele (i.e. error) in the heterozygous individuals?

    Another possibility is that you have duplicate regions aligning to the same part of your reference. Reference genomes are notorious is missing copy number variation or closely related paralogues, and the only way (as far as Im aware) to detect them is though various programs that apply certain depth/SNP algorithms to determine if its likely your genotypes contain errors or CNV. So in a nutshell, you may have particular genes that have been duplicated along the same chromosome. Something to think about.

    Comment


    • #3
      Hi

      Sorry for not being clear. All of the SNPs fed into Beagle have been filtered to ensure that they only include homozygous SNPs. So the resulting heterozygosity has been introduced during imputation by Beagle, and there's probably a perfectly reasonable statistical explanation for this - for instance the genotype probabilities in the following SNP are 0.112,0.444,0.444 and Beagle assigns the genotype as 0|1 to some individuals but 1|1 to others even though the probabilities are the same.

      Code:
      GroupUn1430	532	.	G	A	.	PASS	AR2=0;DR2=0.03;AF=0.677	GT:DS:GP	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:2:0,0,1
      My thread is more of a general request for a discussion on processing VCF files from haploid individuals, for downstream population genetics analyses. One thought was to use just the callable portion of the genome in all individuals, however this reduces the SNPs across all samples from ~ 4 M to < 200 K. And as demonstrated above, imputation may not be a reliable solution.

      I guess I'm looking for some reliable instruction on merging VCF files from haploid individuals, imputing missing genotypes where possible.

      Comment


      • #4
        On reflection, the Beagle imputation works on haplotypes so that can explain the discordance between the genotpye probabilities and genotype called at an individual SNP.

        I think the simplest solution might be just to impute missing genotypes with the most common allele for each SNP in a given population...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM
        • seqadmin
          Choosing Between NGS and qPCR
          by seqadmin



          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
          10-18-2024, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 11-08-2024, 11:09 AM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-08-2024, 06:13 AM
        0 responses
        38 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-01-2024, 06:09 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-30-2024, 05:31 AM
        0 responses
        23 views
        0 likes
        Last Post seqadmin  
        Working...
        X