Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jflowers
    replied
    I also need to call SNPs on haploid genomes. It looks like methods like samtools mpileup / bcftools won't work because the Bayes snp-calling formula uses the allele frequency spectrum as the prior (but the AFS is estimated assuming diploidy).

    Can anyone suggest a workaround?

    Leave a comment:


  • wanguan2000
    replied
    Thank U for garwuf explanation. I think:You mean samtools 0.1.15 better than GATK for ploidy SNP Calling?
    Both samtools and GATK SNP VCF results have heterozygosis SNP for ploidy, and those SNPs are reliable or not?but freebayes's reults contain only homozygous SNP。
    I wonder why heterozygosis SNP was occur in ploidy。

    Leave a comment:


  • garwuf
    replied
    Originally posted by wanguan2000 View Post
    what about samtools vs GATK snp efficiency for ploidy?
    I do not quite get what did you mean by the "efficiency for ploidy". GATK is optimized for diploid genomes. Still, it can be used on haploid ones. You may have genotype part of vcf output screwed up, but it will detect snps anyway. When searching for snps/indels in haploid genomes, samtools is clearly superior to GATK but it's rather because of difference in search algorithms. At best, GATK can report ~60% of variants, detected by samtools. GATK's UnifiedGenotyper is still not good with indels despite they had some progress during last year. Gigabayes was almost as good as samtools til version 0.1.15 despite it can operate only on Mosaik alignments. The most recent samtools versions (0.1.17-0.1.18) perform noticeably better than it with regard to "correct variant/false positive" ratio. I still run Gigabayes alongside with samtools, just because sometime it can detect a variant overlooked by samtools. But this is a rare event, it's like 1-2 variants per 4 Mb-sized genome.
    Last edited by garwuf; 11-29-2011, 07:28 AM.

    Leave a comment:


  • wanguan2000
    replied
    Originally posted by garwuf View Post
    I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

    Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

    To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.
    what about samtools vs GATK snp efficiency for ploidy?

    Leave a comment:


  • garwuf
    replied
    I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

    Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

    To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.

    Leave a comment:


  • wanguan2000
    replied
    I confused the freebayes's commdane:
    -H --diploid-reference
    If using the reference sequence as a sample (default),
    treat it as diploid. default: false (reference is haploid)
    ###########
    My understanding is this:
    human(diploid) -H flase
    bacteria(haploid) -H true
    but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.


    My another understanding is:
    human(23 chromsomes) -H flase
    human(23*2 chromsomes) -H true

    which is true?
    ######################
    -p --ploidy N Sets the default ploidy for the analysis to N. default: 2
    ###
    For haploid just set -p 1,and not need to set -H ?

    Leave a comment:


  • gaffa
    replied
    Originally posted by wanguan2000 View Post
    I confused the freebayes's commdane:
    -H --diploid-reference
    If using the reference sequence as a sample (default),
    treat it as diploid. default: false (reference is haploid)
    My understanding is this:
    human(diploid) -H flase
    bacteria(haploid) -H true
    but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.
    What is wrong?
    You also need to set the ploidy of the sample, using the p flag (i.e. -p 1. Default is 2).

    Leave a comment:


  • wanguan2000
    replied
    I confused the freebayes's commdane:
    -H --diploid-reference
    If using the reference sequence as a sample (default),
    treat it as diploid. default: false (reference is haploid)
    My understanding is this:
    human(diploid) -H flase
    bacteria(haploid) -H true
    but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.
    What is wrong?

    Leave a comment:


  • d17
    replied
    Originally posted by gaffa View Post
    I have used FreeBayes on haploid sequences with good results; it is recommended.
    Could you be more specific about what you mean by good results? Did you compare FreeBayes to any other programs?

    Leave a comment:


  • gaffa
    replied
    I have used FreeBayes on haploid sequences with good results; it is recommended.

    Leave a comment:


  • d17
    replied
    Originally posted by flipwell View Post
    Did you try FreeBayes? I'm facing this problem now and wondering what to use. I've tried GATK and it does appear to work (very superficial examination) but am concerned there might be issues I'm not seeing
    I did try FreeBayes, and I was able to get it to work over a small region for a single sample, but when I expanded to call SNPs over whole chromosomes for several samples at once it no longer worked (seemed to hang/freeze and didn't provide any error messages).

    What I ended up doing was using GATK's UnifiedGenotyper, manually extracting the likelihoods for both of the homozygote genotypes, and calling a SNP if the likelihood of the alternative allele was above a certain amount higher than the likelihood of the reference allele (I believe I required the likelihood of the alt allele to be at least 3X greater than the ref allele, although I haven't tested extensively to find the best threshold).

    Leave a comment:


  • flipwell
    replied
    Did you try FreeBayes? I'm facing this problem now and wondering what to use. I've tried GATK and it does appear to work (very superficial examination) but am concerned there might be issues I'm not seeing

    Leave a comment:


  • d17
    started a topic calling SNPs in haploid genomes

    calling SNPs in haploid genomes

    Does anyone have any thoughts on calling SNPs from short read data (e.g. Illumina) in haploid genomes? It seems that many SNP calling programs are set up to deal only with diploid genomes (e.g. GATK's UnifiedGenotyper).

    I found the program FreeBayes from the Marth Lab which allows you to specify the ploidy. This looks like a good candidate and I will definitely try it. It appears to be unpublished.

    Does anyone have any experience with calling SNPs in haploid genomes using FreeBayes or another program?

    Thanks!

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:24 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-12-2024, 07:41 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-10-2024, 07:59 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Working...
X