Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ekg
    replied
    To answer the original post, simply running

    % freebayes -p 1 -f reference.fasta alignments.bam

    is sufficient to generate haploid SNP, indel, and complex allele calls using freebayes. The method is described in arXiv:1207.3907, "Haplotype-based variant detection from short-read sequencing."

    If anyone has issues with this method, please report them to me (via email) or to the freebayes mailing list.

    Happy variant detecting.

    Leave a comment:


  • ekg
    replied
    Originally posted by garwuf View Post
    I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

    Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

    To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.
    I'm the author of freebayes.

    Did you submit bug reports about these issues? We have been using freebayes for haploid detection without issue.

    When you say that freebayes was reporting many false SNPs, was this before or after you filtered the output on the QUAL field? It is our expectation that users filter the output data, and the output will include many SNPs with very low reported quality so as to allow filtering at any desired level.

    The test setup you are describing is very similar to one we use during development, but your results are dramatically different.

    Also, I am not aware of any existing issues with larger genomes, as we typically work with human samples, but again, I will be able to resolve anything with a bug report.

    It's likely that if other users reported the same issues they have been resolved in the time since you tested.

    Leave a comment:


  • ragowthaman
    replied
    @Kasycas and @jgibbons1.
    Its highly possible you wrote/found a script to map your SNPs on to genes (or find out synonymous and non-syn mutations.
    I use snpEFF program for that. All you need is your VCF file and gene annotations in GFF format.



    Shamefully agree, i wrote a (inferior)script to do it myself before finding this one.
    Gowthaman

    Leave a comment:


  • zhiwei
    replied
    You may try this recent program SNVer.

    It models the number of haploids in its model so it is applicable to haplid genomes too.


    Originally posted by d17 View Post
    Does anyone have any thoughts on calling SNPs from short read data (e.g. Illumina) in haploid genomes? It seems that many SNP calling programs are set up to deal only with diploid genomes (e.g. GATK's UnifiedGenotyper).

    I found the program FreeBayes from the Marth Lab which allows you to specify the ploidy. This looks like a good candidate and I will definitely try it. It appears to be unpublished.

    Does anyone have any experience with calling SNPs in haploid genomes using FreeBayes or another program?

    Thanks!

    Leave a comment:


  • vv85
    replied
    Like another poster has mentioned I prefer using samtools on haploid genomes. False positive variants are always possible depending on the initial sequencing data you're using and specific features of your genome.

    Leave a comment:


  • Medo
    replied
    HI vv85,
    Thanks a lot , that was the reason .
    But do you know really if samtools pileup and GATK are really applicable in haploid genomes or i will get false positive variants?

    Thanks alot

    Leave a comment:


  • vv85
    replied
    Originally posted by Medo View Post
    Hi,
    I wanna ask about the samtools mpileup and Gatk commands for haploid genome in bacteria.
    I tried them many times but it always hangs with me.
    knowing that I did my allignment using Bowtie 2 which allows allignments with gaps.
    for instance , this is my mpileup command :

    samtools mpileup -uf NC_008596.1.fasta mt1sortfilter.bam ->snp/pileup/mt1.pileup

    I don't know what's wrong, but it freeze and give nothing for hours

    thanks
    the - before the > might be the problem

    Leave a comment:


  • Medo
    replied
    mpileup and Gtak command for haploid genomes

    Hi,
    I wanna ask about the samtools mpileup and Gatk commands for haploid genome in bacteria.
    I tried them many times but it always hangs with me.
    knowing that I did my allignment using Bowtie 2 which allows allignments with gaps.
    for instance , this is my mpileup command :

    samtools mpileup -uf NC_008596.1.fasta mt1sortfilter.bam ->snp/pileup/mt1.pileup

    I don't know what's wrong, but it freeze and give nothing for hours

    thanks

    Leave a comment:


  • Medo
    replied
    Hi garwuf,
    I wanna ask you about the samtools mpileup command for haploid genome in bacteria. I tried it many times but it always hangs with me.
    knowing that I did my allignment using Bowtie 2 which allows allignments with gaps.
    this is my commands:

    samtools mpileup -uf NC_008596.1.fasta mt1sortfilter.bam ->snp/pileup/mt1.pileup

    I don't know what's wrong, but it freeze and give nothing for hours

    thanks



    Originally posted by garwuf View Post
    I gave quite an extensive try to Freebayes recently, and wouldn't recommend it in its current state. I have tried it on several bacterial datasets (of 4 - 6 Mb size), which were previously evaluated with Gigabayes, Samtools and GATK, and found that Freebayes reports nonexisting snps while missing well-defined ones. In fact, not a single snp was correctly predicted, no matter which parameters have been used.

    Then, after reading the above post of d17, I decided to try Freebayes on smaller reference. I have generated two artificial sets of reads to a 128 kb template with 10 variant sites of different complexity. One set provided 50x , another one 400x coverage, and the alignment was performed with bwa. On this alignments, Freebayes has generated sane vcf output: no false positives, several snps were detected correctly. Still, the efficiency was quite low: for 50x dataset, it never reported more than 3 variants out of 10, and for 400x dataset it was 4-5 depending on settings. For comparison, Samtools 1.18 detected all 10 variants even on 50x dataset.

    To my mind, Freebayes may have some problem with handling cashed sequence data, that's why it works with kb-sized but fails on Mb-sized references. On the other hand, it's still being developed. Maybe eventually these bugs will be fixed.

    Leave a comment:


  • Kasycas
    replied
    Yep, got that alright. Position just isn't enough because you then need to see the gene's it's affecting. I guess it means writing a script.

    Thanks for your response anyway, it's always better that nothing!!

    Kas

    Leave a comment:


  • jgibbons1
    replied
    Hmmm...ok. I haven't figured out how to see if a SNP is synonymous or non synonymous but all of the other information is in the SNP output after you run the "cns2snp" command.

    Here's an example output of the 1st five columns:

    chromosome, position, reference base, consensus base, Phred-like consensus quality

    GENE; SITE; REF_BASE; SNP_BASE; QUALITY_SCORE
    lcl|AL123456.2_gene_1725 268 T C 255
    lcl|AL123456.2_gene_1731 219 C T 255
    lcl|AL123456.2_gene_1731 447 T C 255
    lcl|AL123456.2_gene_1731 485 C T 255
    lcl|AL123456.2_gene_1732 69 A G 255

    Do you get the same output? If you find software to characterize the SNP itself I would love to know about it too!

    Leave a comment:


  • Kasycas
    replied
    @jgibbons1

    Both actually, it would be nice to have interpretable output where you can see how relevant a particular SNP is. Therefore, I was trying to get information such as; what gene it's from, what position within the gene has the SNP, the resulting amino acid change if any and if it's syn/nonsynonomous.

    I'm finding it hard to believe a tool for this purpose doesn't exist!

    Thanks for the reply.

    Leave a comment:


  • jgibbons1
    replied
    Hi Kasycas,
    What exactly do you mean by annotation - synonymous or non synonymous? SNP location in the genome?
    John

    Leave a comment:


  • Kasycas
    replied
    Hi jgibbons1, I've been using MAQ as well but the snp output is useless without annotation. Have you come across a good way to annotate the output that MAQ produces?

    Thanks!

    Leave a comment:


  • jgibbons1
    replied
    I've been using Maq (http://maq.sourceforge.net/maq-man.shtml) for SNP detection in my haploid system. No complaints whatsoever.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    Today, 01:16 PM
  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:15 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 10:28 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:35 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-22-2024, 02:06 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Working...
X