Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gibberwocky
    Junior Member
    • Mar 2013
    • 8

    Haploid inspiration

    Hi all

    I'm looking for some inspiration/guidance on processing NGS data from haploid individuals. In a nutshell, we've sequenced at ~ 6X coverage a number of individuals (~30) from a single population. Following mapping, processing, and SNP calling, each individual has ~ 600K SNPs.

    At this depth of coverage, ~ 70% of the genome is callable for each individual, and it stands to reason that individuals will not have identical genome coverage. And so in order to generate a single VCF for the population, without sacrificing an enormous number of SNPs, it would make sense to impute missing genotypes. To this end I have experimented with Beagle (v4) on a single chromosome, and it appeared to do the job - although upon closer examination the results indicated ~ 25% heteroygosity in each haploid individual. In addition, all samples were on average 85% idential by state, including 2 individuals which had been sequenced twice and should present a reliable control for the methodology.

    Is anyone with experience in processing NGS data for haploids able to offer any insights/suggestions?

    D
  • JackieBadger
    Senior Member
    • Mar 2009
    • 385

    #2
    What exactly is your question?


    Perhaps "Why are my genotypes showing to have some heterozygosity and why are replicates not identical?"

    The answer would simply be artifacts in your data,either from lab protocols, contamination, or sequencing. 25% seems high, but if you are 110% that these genomes are haploid then what is stoping you from throwing out the minor allele (i.e. error) in the heterozygous individuals?

    Another possibility is that you have duplicate regions aligning to the same part of your reference. Reference genomes are notorious is missing copy number variation or closely related paralogues, and the only way (as far as Im aware) to detect them is though various programs that apply certain depth/SNP algorithms to determine if its likely your genotypes contain errors or CNV. So in a nutshell, you may have particular genes that have been duplicated along the same chromosome. Something to think about.

    Comment

    • gibberwocky
      Junior Member
      • Mar 2013
      • 8

      #3
      Hi

      Sorry for not being clear. All of the SNPs fed into Beagle have been filtered to ensure that they only include homozygous SNPs. So the resulting heterozygosity has been introduced during imputation by Beagle, and there's probably a perfectly reasonable statistical explanation for this - for instance the genotype probabilities in the following SNP are 0.112,0.444,0.444 and Beagle assigns the genotype as 0|1 to some individuals but 1|1 to others even though the probabilities are the same.

      Code:
      GroupUn1430	532	.	G	A	.	PASS	AR2=0;DR2=0.03;AF=0.677	GT:DS:GP	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	1|0:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:1.333:0.112,0.444,0.444	0|1:1.333:0.112,0.444,0.444	1|1:2:0,0,1
      My thread is more of a general request for a discussion on processing VCF files from haploid individuals, for downstream population genetics analyses. One thought was to use just the callable portion of the genome in all individuals, however this reduces the SNPs across all samples from ~ 4 M to < 200 K. And as demonstrated above, imputation may not be a reliable solution.

      I guess I'm looking for some reliable instruction on merging VCF files from haploid individuals, imputing missing genotypes where possible.

      Comment

      • gibberwocky
        Junior Member
        • Mar 2013
        • 8

        #4
        On reflection, the Beagle imputation works on haplotypes so that can explain the discordance between the genotpye probabilities and genotype called at an individual SNP.

        I think the simplest solution might be just to impute missing genotypes with the most common allele for each SNP in a given population...

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          03-24-2025, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        49 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        57 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        50 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        201 views
        0 reactions
        Last Post seqadmin  
        Working...