Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • calling SNPs in haploid genomes

    Does anyone have any thoughts on calling SNPs from short read data (e.g. Illumina) in haploid genomes? It seems that many SNP calling programs are set up to deal only with diploid genomes (e.g. GATK's UnifiedGenotyper).

    I found the program FreeBayes from the Marth Lab which allows you to specify the ploidy. This looks like a good candidate and I will definitely try it. It appears to be unpublished.

    Does anyone have any experience with calling SNPs in haploid genomes using FreeBayes or another program?

    Thanks!

  • #2
    Did you try FreeBayes? I'm facing this problem now and wondering what to use. I've tried GATK and it does appear to work (very superficial examination) but am concerned there might be issues I'm not seeing

    Comment


    • #3
      Originally posted by flipwell View Post
      Did you try FreeBayes? I'm facing this problem now and wondering what to use. I've tried GATK and it does appear to work (very superficial examination) but am concerned there might be issues I'm not seeing
      I did try FreeBayes, and I was able to get it to work over a small region for a single sample, but when I expanded to call SNPs over whole chromosomes for several samples at once it no longer worked (seemed to hang/freeze and didn't provide any error messages).

      What I ended up doing was using GATK's UnifiedGenotyper, manually extracting the likelihoods for both of the homozygote genotypes, and calling a SNP if the likelihood of the alternative allele was above a certain amount higher than the likelihood of the reference allele (I believe I required the likelihood of the alt allele to be at least 3X greater than the ref allele, although I haven't tested extensively to find the best threshold).

      Comment


      • #4
        I have used FreeBayes on haploid sequences with good results; it is recommended.

        Comment


        • #5
          Originally posted by gaffa View Post
          I have used FreeBayes on haploid sequences with good results; it is recommended.
          Could you be more specific about what you mean by good results? Did you compare FreeBayes to any other programs?

          Comment


          • #6
            I confused the freebayes's commdane:
            -H --diploid-reference
            If using the reference sequence as a sample (default),
            treat it as diploid. default: false (reference is haploid)
            My understanding is this:
            human(diploid) -H flase
            bacteria(haploid) -H true
            but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.
            What is wrong?

            Comment


            • #7
              Originally posted by wanguan2000 View Post
              I confused the freebayes's commdane:
              -H --diploid-reference
              If using the reference sequence as a sample (default),
              treat it as diploid. default: false (reference is haploid)
              My understanding is this:
              human(diploid) -H flase
              bacteria(haploid) -H true
              but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.
              What is wrong?
              You also need to set the ploidy of the sample, using the p flag (i.e. -p 1. Default is 2).

              Comment


              • #8
                I confused the freebayes's commdane:
                -H --diploid-reference
                If using the reference sequence as a sample (default),
                treat it as diploid. default: false (reference is haploid)
                ###########
                My understanding is this:
                human(diploid) -H flase
                bacteria(haploid) -H true
                but I found a lot of heterozygous snp in bacteria (50%)in my result of vcf file.


                My another understanding is:
                human(23 chromsomes) -H flase
                human(23*2 chromsomes) -H true

                which is true?
                ######################
                -p --ploidy N Sets the default ploidy for the analysis to N. default: 2
                ###
                For haploid just set -p 1,and not need to set -H ?

                Comment


                • #9
                  I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

                  Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

                  To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.

                  Comment


                  • #10
                    Originally posted by garwuf View Post
                    I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

                    Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

                    To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.
                    what about samtools vs GATK snp efficiency for ploidy?

                    Comment


                    • #11
                      Originally posted by wanguan2000 View Post
                      what about samtools vs GATK snp efficiency for ploidy?
                      I do not quite get what did you mean by the "efficiency for ploidy". GATK is optimized for diploid genomes. Still, it can be used on haploid ones. You may have genotype part of vcf output screwed up, but it will detect snps anyway. When searching for snps/indels in haploid genomes, samtools is clearly superior to GATK but it's rather because of difference in search algorithms. At best, GATK can report ~60% of variants, detected by samtools. GATK's UnifiedGenotyper is still not good with indels despite they had some progress during last year. Gigabayes was almost as good as samtools til version 0.1.15 despite it can operate only on Mosaik alignments. The most recent samtools versions (0.1.17-0.1.18) perform noticeably better than it with regard to "correct variant/false positive" ratio. I still run Gigabayes alongside with samtools, just because sometime it can detect a variant overlooked by samtools. But this is a rare event, it's like 1-2 variants per 4 Mb-sized genome.
                      Last edited by garwuf; 11-29-2011, 07:28 AM.

                      Comment


                      • #12
                        Thank U for garwuf explanation. I think:You mean samtools 0.1.15 better than GATK for ploidy SNP Calling?
                        Both samtools and GATK SNP VCF results have heterozygosis SNP for ploidy, and those SNPs are reliable or not?but freebayes's reults contain only homozygous SNP。
                        I wonder why heterozygosis SNP was occur in ploidy。

                        Comment


                        • #13
                          I also need to call SNPs on haploid genomes. It looks like methods like samtools mpileup / bcftools won't work because the Bayes snp-calling formula uses the allele frequency spectrum as the prior (but the AFS is estimated assuming diploidy).

                          Can anyone suggest a workaround?

                          Comment


                          • #14
                            I've been using Maq (http://maq.sourceforge.net/maq-man.shtml) for SNP detection in my haploid system. No complaints whatsoever.

                            Comment


                            • #15
                              Hi jgibbons1, I've been using MAQ as well but the snp output is useless without annotation. Have you come across a good way to annotate the output that MAQ produces?

                              Thanks!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Genetic Variation in Immunogenetics and Antibody Diversity
                                by seqadmin



                                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                11-06-2024, 07:24 PM
                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 11:09 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Today, 06:13 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 11-01-2024, 06:09 AM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-30-2024, 05:31 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X