Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nlsullivan
    Junior Member
    • Oct 2010
    • 2

    determining if a SNP is synonymous or not

    I work with a pretty uncharacterized bacterium (genome size 4Mb) and I have a collection of SNPs in my mutants. I would like to determine whether or not they are in coding regions and whether they impact the protein sequence.
    At the moment I am using Galaxy for my analysis and I have an annotated genome of my bug and a list of SNPs (basepair, parent base call, mutant base call). Is there a way to determine for each SNP whether it is in a coding region and what impact the SNP has on the protein sequence (silent, missense, nonsense) besides looking at each by hand?
    I'm pretty new to sequencing and bioinformatics, so appreciate all the help you can give me.
    Thank you!
  • JohnK
    Senior Member
    • Feb 2010
    • 106

    #2
    I like Annovar. They have some helpful conversion scripts and will annotate variations against UCSC or refSeq Gene. You can annotate SNPs or indels. A popular thing to do is download a few gene models and merge them into their union. You can then annotate with the union of the gene model to get a more comprehensive feel for what transcripts your variation might be effecting. Just keep in mind that they're curated differently. Most importantly, Annovar is open-source:

    Comment

    • nlsullivan
      Junior Member
      • Oct 2010
      • 2

      #3
      JohnK,
      Thanks for your reply. ANNOVAR sounds like it would be great, but it looks to me that it requires that the genome be present on the UCSC genome browser. My genome is not there (and neither are any related or even bacterial genomes). Any suggestions for getting around this?

      Comment

      • JohnK
        Senior Member
        • Feb 2010
        • 106

        #4
        You can create your own FASTA with Annovar by going to the homepage, go to gene-based sub-link and then read this:

        "I also provide programs to build FASTA sequences for any other genomes for which I do not provide pre-built files.

        In summary, in the ANNOVAR package, the FASTA files must be downloaded from the ANNOVAR website directly. The ANNOVAR software handles all these issues for users, via the -downdb argument, so users do not need to how or where to download files. Additionally, I provide a script within the ANNOVAR package called retrieve_seq_from_fasta.pl, such that users can build a FASTA sequence files based on a given genome sequence (such as the 3 billion base pairs in human genome). This is helpful, when users need to deal with a genome for which I do not provide the corresponding FASTA files. "

        retrieve_seq_from_fasta.pl

        Just make sure you format your fasta file or genome to their specs- remove newlines (or not) if necessary.

        Comment

        • sbberes
          Member
          • Jan 2009
          • 22

          #5
          John,
          I do this kind of work all the time. To facilitate this process our lab has put together a perl script that takes as input a reference genome sequence in genbank format (which can easily be generated using Artemis or from an automated annotator like RAST) and a list of biallelic SNPs callled relative to the reference genome and ouputs whether the snp is coding or intergenic, and if coding if it is syn of nonsyn, and if nonsyn what the amino acid change is or if it is nonsensense. The perl code requires the bioperl modules. Contact me if you think this script would be of use to you.
          Steve Beres
          [email protected]

          Comment

          • Giulietta
            Junior Member
            • Nov 2010
            • 8

            #6
            If your bacteria is in EnsemblGenomes, you can use the variant effect predictor to calculate the effect of sequence variations on an Ensembl transcript (including any amino acid change). Start with the genomic location and possible alleles. Use the online interface, or the Perl API script:

            Ensembl Bacteria is a genome-centric portal for bacterial species of scientific interest


            Example of input and output:

            Ensembl Bacteria is a genome-centric portal for bacterial species of scientific interest


            At the moment you can use it for these species:

            Ensembl Bacteria is a genome-centric portal for bacterial species of scientific interest

            Comment

            • Sacrolfur
              Member
              • Oct 2013
              • 13

              #7
              You can use Procannot (http://platform.genexplain.com/procannot). It's simple and has user-friendly interface!

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 07-02-2026, 11:08 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              54 views
              0 reactions
              Last Post SEQadmin2  
              Working...