Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zaclown
    Junior Member
    • Sep 2010
    • 3

    entrez ID conversion

    Hello,

    does anyone know how to convert entrez I.D. to either Refseq ID or Gene Symbols?
    I have found resources on Refseq to Gene Symbol conversion, but I can't find anything on Entrez I.D.
    The genome I work with is C. elegans.
    Thanks in advance for any suggestion
  • gaffa
    Member
    • Oct 2010
    • 82

    #2
    Try UniProt's online conversion service: http://www.uniprot.org -> "ID Mapping" tab

    Comment

    • Richard Finney
      Senior Member
      • Feb 2009
      • 701

      #3
      NCBI maintains a flatfiles of gene annotations which contains the information you're after:
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
      [ There are other interesting files in that directory ]


      The tax_id (taxonomy ID for C.Elgans is 6239 ) [ from Taxonomy browser http://www.ncbi.nlm.nih.gov/taxonomy ]

      You can type : "wget -nc ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz" from the command line, or download via a browser.

      Example using this data :
      bash-3.00$ cat gene2refseq | awk '{if ($1==6239) print $0}' | head
      6239 171590 REVIEWED NM_058260.3 193203640 NP_490660.1 17510631 NC_003279.6 193203938 4123 10231 - -
      6239 171591 REVIEWED NM_058259.3 193203639 NP_490661.1 17510629 NC_003279.6 193203938 11498 16830 + -
      6239 171592 REVIEWED NM_058261.3 133902001 NP_490662.1 17510633 NC_003279.6 193203938 17496 26780 - -
      6239 171592 REVIEWED NM_058262.3 86561628 NP_490663.1 17510635 NC_003279.6 193203938 17496 26780 - -
      6239 171593 REVIEWED NM_058263.3 115533565 NP_490664.2 115533566 NC_003279.6 193203938 27594 32481 - -
      6239 171594 REVIEWED NM_058265.3 71995026 NP_490666.2 25143331 NC_003279.6 193203938 49918 54359 + -
      6239 171595 REVIEWED NM_058267.4 115533567 NP_490668.4 115533568 NC_003279.6 193203938 55315 64020 - -
      6239 171597 REVIEWED NM_058269.2 71995034 NP_490670.1 17510145 NC_003279.6 193203938 85044 86283 - -
      6239 171599 REVIEWED NM_058271.6 212645149 NP_490672.2 25143337 NC_003279.6 193203938 93030 94880 + -
      6239 171600 REVIEWED NM_058272.4 212645150 NP_490673.1 17510147 NC_003279.6 193203938 96478 100612 - -
      -bash-3.00$ cat gene_info | grep 171590 | awk '{if ($1==6239) print $0}'
      6239 171590 Y74C9A.3 Y74C9A.3 - WormBase:WBGene00022277 I - hypothetical protein protein-coding - - - - 20101017

      Comment

      • Fuad
        Junior Member
        • Jun 2009
        • 2

        #4
        DAVID has a Gene ID Conversion tool:



        Fuad

        Comment

        • rdu
          Member
          • Aug 2010
          • 29

          #5
          Bioconductor package "biomaRt" also could do it.

          Comment

          • peachgil
            Junior Member
            • Feb 2011
            • 2

            #6
            In Bioconductor, just use the following codes:

            > library(org.Hs.eg.db)
            > library(annotate)
            > lookUp('3815', 'org.Hs.eg', 'SYMBOL')
            $`3815`
            [1] "KIT"

            > lookUp('3815', 'org.Hs.eg', 'REFSEQ')
            $`3815`
            [1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"

            Comment

            • MDonlin
              Member
              • May 2010
              • 14

              #7
              You can also do ID conversion using Biomart at EBI.

              Comment

              • jmw86069
                Member
                • Jun 2009
                • 31

                #8
                Always a fan of the linux one-liner, here is an example for human ACTB gene using hg18:

                mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e "select k2ll.value as entrezGeneId, kx.refseq as refseqMrna, kx.geneSymbol as entrezGeneSymbol, kx.description as entrezGeneDesc from kgXref kx, knownToLocusLink k2ll where k2ll.name=kx.kgID and kx.geneSymbol='ACTB';"
                UCSC's C.elegans tables don't include the knownGene and kg% tables, but some poking around ( using "show tables like '%locus%';" ) led me to formulate this MySQL query that takes locusLinkId as input and prints the gene symbol, refseq mRNA, description, etc.

                mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D ce6 -e "select rl.locusLinkId, rl.name as geneName, rl.product as geneDescription, rl.mrnaAcc as refseqMrna, rl.protAcc as refseqProt from refLink rl where rl.locusLinkId=174288;"
                The bummer is that you have to tell it to use "ce6" -- it isn't generic enough to sniff out what organism and version to use a priori. But you'll know which one to use right? :-) And you can of course change the "=174288" to "IN (174288, 174289,174290)" for more of a bulk-input-experience, depending upon what you need. If you end up batch-scripting some geneID conversions, I'd definitely use the "IN" clause instead of querying them one-by-one. Markedly faster.

                DAVID is in theory a great resource, but could be opened up to increase the API limits, or to allow direct data downloads.

                Comment

                • zaclown
                  Junior Member
                  • Sep 2010
                  • 3

                  #9
                  Thank you all guys

                  Comment

                  • moushengxu@gmail.com
                    Junior Member
                    • Oct 2016
                    • 1

                    #10
                    How to do the opposite?

                    Originally posted by peachgil View Post
                    In Bioconductor, just use the following codes:

                    > library(org.Hs.eg.db)
                    > library(annotate)
                    > lookUp('3815', 'org.Hs.eg', 'SYMBOL')
                    $`3815`
                    [1] "KIT"

                    > lookUp('3815', 'org.Hs.eg', 'REFSEQ')
                    $`3815`
                    [1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"
                    I have a set of HGNC gene symbols, and I want to convert them to Entrez Gene IDs.

                    Thanks much!

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Yesterday, 11:10 AM
                    0 responses
                    7 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    42 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    104 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    125 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...