Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    SVA only support hg18 so no point using it for the moment if you aligned to hg19.

    Comment


    • #17
      ensembl works on both NCBI36 and GRCh37 (hg18/hg19)

      Comment


      • #18
        I'm an intern working with exome analysis and I am facing the same thing. I want to implement this annotation into my own java pipeline and although the solution may seem very easy, I am troubled in finding the right approach.

        I have tried using the RefSeq annotation file of SeqCap EZ Exome v2 (matching UCSC genome browser with HG19), which holds information on cdsStart, -End en exon starts and endings. This file also holds an Ensembl gene reference for each RefSeq gene, which should make it easy to link with the cDNA fasta file of Ensembl and get exactly what I want...

        ... a few problems though:
        1) RefSeq- and ensembl genes overlap and multiple of the same ensembl references may occur in the ensembl fasta file, making it hard to differentiate. This is most likely due to different isoforms.
        2) Looking at a few cases, I noticed that some RefSeq genes show cdsStart and cdsEnd positions that can not be traced back to ensembl. In other words: when I read the ensembl reference from the RefSeq file and look them up in the ensembl file, I can find multiple isoforms, but none with the same cdsStart and/or cdsEnd. I already take into account that RefSeq and Ensembl differ 1 nuc. in cdsStart. Both files are based on HG19, so that can't be the problem either.

        What would be the best approach on solving this puzzle? Should I just walk through the entire genome and annotate all the information to my SNPs as I go along? Any thoughts are welcome.

        Thanks a bunch!

        Comment


        • #19
          If you have hg19/GRCh37 positions for all your snps I would suggest using a tool like the ensembl variant effect predictor to get the consequences of your snps and then tracing to refseq ids using the ensembl xref system rather than doing it the other way around

          Refseq models and Ensembl models should be mostly the same for the cds coordinates (though not in all models) but to get the models which are identical across both sets it best to look at the ccds models http://www.ncbi.nlm.nih.gov/projects...CcdsBrowse.cgi

          Do remember that utr coordinate may be different across both sets

          Comment


          • #20
            We're looking at using Ensembl's Variant Effect Predictor right now ... but with a non-model organism (or at least a model organism with some regions replaced by a "better" assembly). It seems like we'll need to set up a local database in order to provide the reference and gene models. Does anyone have experience setting up a db like this?

            Comment


            • #21
              You could also try :



              The requirement is that the input snps be in VCF format.

              -Abhi

              Comment


              • #22
                Thanks, but it seems to only do SNPs ... we need to annotate indels, too.

                Comment


                • #23
                  For help with setting up a custom ensembl database I would suggest emailing [email protected] to get help

                  Comment


                  • #24
                    Originally posted by jnfass View Post
                    Thanks, but it seems to only do SNPs ... we need to annotate indels, too.
                    Try Annovar. http://www.openbioinformatics.org/annovar/
                    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                    Projects: U87MG whole genome sequence [Website] [Paper]

                    Comment


                    • #25
                      Check out the Polyphen-2 website. If you run your SNPs through the batch page:



                      it will output a SNP file which is a mapping of all your SNPs. If you work with different settings for the advanced options on the input page you should be able to annotate the SNPs in many ways. Hope this helps!

                      Comment


                      • #26
                        @Michael.James.Clark ... Annovar seems to require annotation databases from the UCSC Genome Browser. That doesn't exist for at least part of the genome I'm working with.

                        @nexgengirl ... Polyphen-2 seems to be restricted to human

                        @laura ... thanks for the suggestion; I might try contacting ensembl.

                        But in the meantime someone responded to my Biostar post and suggested snpEff (snpeff.sourceforge.net) and it seems to fit the bill. If anyone has had good or bad experience with it, I'd appreciate hearing about it.

                        Comment


                        • #27
                          Originally posted by jnfass View Post
                          @Michael.James.Clark ... Annovar seems to require annotation databases from the UCSC Genome Browser. That doesn't exist for at least part of the genome I'm working with.

                          @nexgengirl ... Polyphen-2 seems to be restricted to human

                          @laura ... thanks for the suggestion; I might try contacting ensembl.

                          But in the meantime someone responded to my Biostar post and suggested snpEff (snpeff.sourceforge.net) and it seems to fit the bill. If anyone has had good or bad experience with it, I'd appreciate hearing about it.
                          For your and others' information, Annovar doesn't require annotation databases from UCSC. You can make your own annotations and feed them to the program. Read the documentation.
                          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                          Projects: U87MG whole genome sequence [Website] [Paper]

                          Comment


                          • #28
                            Ok ... thanks.

                            You're probably referring to this.

                            Comment


                            • #29
                              Yeah, I'm not saying it is necessarily the easiest solution for you, but it is incorrect to state that it requires annotation databases from UCSC.
                              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                              Projects: U87MG whole genome sequence [Website] [Paper]

                              Comment


                              • #30
                                Thanks for the idea on SNPeff. It's working very well for me and even supports some bacterial genomes out of the box. Input format is also easy to generate, as well as VCF or pileup.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 09:45 AM
                                0 responses
                                201 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 08:54 AM
                                0 responses
                                211 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-02-2024, 03:00 PM
                                0 responses
                                193 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X