I also need to call SNPs on haploid genomes. It looks like methods like samtools mpileup / bcftools won't work because the Bayes snp-calling formula uses the allele frequency spectrum as the prior (but the AFS is estimated assuming diploidy).
Can anyone suggest a workaround?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Thank U for garwuf explanation. I think:You mean samtools 0.1.15 better than GATK for ploidy SNP Calling?
Both samtools and GATK SNP VCF results have heterozygosis SNP for ploidy, and those SNPs are reliable or not?but freebayes's reults contain only homozygous SNP。
I wonder why heterozygosis SNP was occur in ploidy。
Leave a comment:
-
Originally posted by wanguan2000 View Postwhat about samtools vs GATK snp efficiency for ploidy?Last edited by garwuf; 11-29-2011, 07:28 AM.
Leave a comment:
-
Originally posted by garwuf View PostI gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.
Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.
To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.
Leave a comment:
-
I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.
Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.
To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.
Leave a comment:
-
I confused the freebayes's commdane:
-H --diploid-reference
If using the reference sequence as a sample (default),
treat it as diploid. default: false (reference is haploid)
###########
My understanding is this:
human(diploid) -H flase
bacteria(haploid) -H true
but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.
My another understanding is:
human(23 chromsomes) -H flase
human(23*2 chromsomes) -H true
which is true?
######################
-p --ploidy N Sets the default ploidy for the analysis to N. default: 2
###
For haploid just set -p 1,and not need to set -H ?
Leave a comment:
-
Originally posted by wanguan2000 View PostI confused the freebayes's commdane:
-H --diploid-reference
If using the reference sequence as a sample (default),
treat it as diploid. default: false (reference is haploid)
My understanding is this:
human(diploid) -H flase
bacteria(haploid) -H true
but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.
What is wrong?
Leave a comment:
-
I confused the freebayes's commdane:
-H --diploid-reference
If using the reference sequence as a sample (default),
treat it as diploid. default: false (reference is haploid)
My understanding is this:
human(diploid) -H flase
bacteria(haploid) -H true
but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.
What is wrong?
Leave a comment:
-
I have used FreeBayes on haploid sequences with good results; it is recommended.
Leave a comment:
-
Originally posted by flipwell View PostDid you try FreeBayes? I'm facing this problem now and wondering what to use. I've tried GATK and it does appear to work (very superficial examination) but am concerned there might be issues I'm not seeing
What I ended up doing was using GATK's UnifiedGenotyper, manually extracting the likelihoods for both of the homozygote genotypes, and calling a SNP if the likelihood of the alternative allele was above a certain amount higher than the likelihood of the reference allele (I believe I required the likelihood of the alt allele to be at least 3X greater than the ref allele, although I haven't tested extensively to find the best threshold).
Leave a comment:
-
Did you try FreeBayes? I'm facing this problem now and wondering what to use. I've tried GATK and it does appear to work (very superficial examination) but am concerned there might be issues I'm not seeing
Leave a comment:
-
calling SNPs in haploid genomes
Does anyone have any thoughts on calling SNPs from short read data (e.g. Illumina) in haploid genomes? It seems that many SNP calling programs are set up to deal only with diploid genomes (e.g. GATK's UnifiedGenotyper).
I found the program FreeBayes from the Marth Lab which allows you to specify the ploidy. This looks like a good candidate and I will definitely try it. It appears to be unpublished.
Does anyone have any experience with calling SNPs in haploid genomes using FreeBayes or another program?
Thanks!
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 08:24 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
||
Started by seqadmin, 12-10-2024, 07:59 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
12-10-2024, 07:59 AM
|
Leave a comment: