Greetings,
I am looking for the simplest possible way to call variants computationally, just using position coverage and the ratio of reference/alternative allele, for divergent haploid genomes. I.e. no need for prior probability or a prediction based on an allele frequency spectrum. There is good coverage (average depth of coverage is 316X), and the sequences are 100bp reads of high-quality.
I am mapping with BWA and using bcftools to call variants using Bayesian inference in "mpileup" files. It seems bcftools is just too fancy for this to work among these divergent haploid genomes, and I am getting only half the variants I should be, even setting the quality scores as low as 5 for filtering.
Looking at my BAMs, the variants are there but are getting removed somehow, probably because they don't fit with some expected allele-frequency spectrum due to the level of divergence (~10-20 variants/kb)
Is there a simple way, using bcftools, a script that goes into the pileup, or another program to get better calls? FreeBayes is recommended here for haploid genomes, and I'll look into it, but I think it may end-up with the same problem.
Thanks!
I am looking for the simplest possible way to call variants computationally, just using position coverage and the ratio of reference/alternative allele, for divergent haploid genomes. I.e. no need for prior probability or a prediction based on an allele frequency spectrum. There is good coverage (average depth of coverage is 316X), and the sequences are 100bp reads of high-quality.
I am mapping with BWA and using bcftools to call variants using Bayesian inference in "mpileup" files. It seems bcftools is just too fancy for this to work among these divergent haploid genomes, and I am getting only half the variants I should be, even setting the quality scores as low as 5 for filtering.
Looking at my BAMs, the variants are there but are getting removed somehow, probably because they don't fit with some expected allele-frequency spectrum due to the level of divergence (~10-20 variants/kb)
Is there a simple way, using bcftools, a script that goes into the pileup, or another program to get better calls? FreeBayes is recommended here for haploid genomes, and I'll look into it, but I think it may end-up with the same problem.
Thanks!
Comment