Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jschuur1
    Junior Member
    • Mar 2011
    • 3

    Snp analysis Microbial genomes 454 data

    Hello,

    We've done some great 454 runs on some of our favorite microbes, de novo assembled each, sorted the contigs to a database and completed the annotation of all of them, however now I'm sorta stuck in getting to the part where we compare them to look for variation, mostly point mutation and/or small insertions/deletions.

    I know the Roche GS Mapper can do such analysis, however it refuses to read my annotation files (all in gff3) as it requires goldenpath type 128 files. And I can't seem to find anything else which would give me a nice output of snp's in genes and possible corresponding changes in amino acids. I have the consensus reads in several formats, but the good thing from the Roche mapper is that it will include the sequence depth (from the sff files) at which the region with a snp was established, as to eliminate false positives.

    I've browsed these forums, yet I can't find anyone else with this specific problem. Can someone give me some advice on how I can complete my analysis?

    Thanks in advance
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    The Roche gsMapper probably does the best job of aligning 454 read and predicting variants. However I have recently discovered that its prediction of the effects of those variants are completely unreliable. My project was very similar to yours, mapping reads to a well annotated bacterial genome (I created my own refGenes.txt file, and yes, I'm sure it was correct). In some cases the amino acid calls by gsMapper were from the wrong frame, even though it reported the correct frame as read from the refGene.txt file. In other cases the errors were due to the odd way in which gsMapper reports some SNPs. gsMapper will sometimes report a SNP as a deletion followed by an insertion; this may throw off prediction of the SNP effect. I manually went through the HCDiffs file changing these to substitutions. You should to if you decide to use gsMapper for mapping the reads and calling variants followed by another tool for predicting effects of those variants.

    Have you checked out the Ensemble Bacteria Variant and SNP effect predictor?

    Ensembl Bacteria is a genome-centric portal for bacterial species of scientific interest

    Comment

    • colindaven
      Senior Member
      • Oct 2008
      • 417

      #3
      We put an emphasis on SNP calling from 454 data, not de novo assembly to start with. I have been using gsMapper results using SNP calls from the 454HCDiffs files and the AlignmentInfo.tsv as a comparison.

      As kmcarr says the output format is suboptimal with multiple lines per pututative SNP position (ref to deletion on one line, del to SNP base on the next). Why they can't make it like a pileup to avoid confusion is beyond me.

      We assess the effects of SNPs on amino acids with SNPeff. It s the best software I've found so far for this purpose.

      Comment

      • jschuur1
        Junior Member
        • Mar 2011
        • 3

        #4
        Thanks for the quick responses, I'll have a look at the ensembl site.

        On the other hand, so far I haven't been able to actually use the GSmapper as it can't read my reference file (in gff3), as it asks for goldenpath, does anyone know a way to convert from gff3 to a file I can use as an annotation file in GSMapper?

        I can look for snp's in the fasta's coupled with the sff files, but then I lose my annotation.

        Comment

        • colindaven
          Senior Member
          • Oct 2008
          • 417

          #5
          Shouldn't the gsMapper reference be in FastA format ?
          It might be tricky to assess the effects of variants on a whole lot of new contigs, all with new annotations.

          Comment

          • jschuur1
            Junior Member
            • Mar 2011
            • 3

            #6
            It takes a fasta as a reference yes, but in the third tab you can add an annotation file.

            If I just use the fasta file (which I have), it doesn't include the amino acid substitutions. But perhaps I can take the output and run it at the ensembl site.

            Comment

            • enrico
              Junior Member
              • Jul 2010
              • 5

              #7
              Originally posted by kmcarr View Post
              The Roche gsMapper probably does the best job of aligning 454 read and predicting variants. However I have recently discovered that its prediction of the effects of those variants are completely unreliable. My project was very similar to yours, mapping reads to a well annotated bacterial genome (I created my own refGenes.txt file, and yes, I'm sure it was correct). In some cases the amino acid calls by gsMapper were from the wrong frame, even though it reported the correct frame as read from the refGene.txt file. In other cases the errors were due to the odd way in which gsMapper reports some SNPs. gsMapper will sometimes report a SNP as a deletion followed by an insertion; this may throw off prediction of the SNP effect. I manually went through the HCDiffs file changing these to substitutions. You should to if you decide to use gsMapper for mapping the reads and calling variants followed by another tool for predicting effects of those variants.

              Have you checked out the Ensemble Bacteria Variant and SNP effect predictor?

              http://bacteria.ensembl.org/tools.html
              I had the same problem with gsMapper, wrong amino acid variant reported because of a wrong translation frame (different from that specified in the refGene.txt file).

              But it happens for some variants only, the majority of them are correct so I am not able to figure out which is the problem...

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              22 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              40 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              47 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Working...