Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • carp
    Junior Member
    • Jul 2015
    • 5

    mapping pathways to refseq IDs

    Hi,

    I have several annotated bacterial genomes, and would like to map pathway information to the coding sequences in each genome. In the past I've used blast2go to query KEGG, but no longer have access to this. So, I've been looking at free command line programs (mostly R based: reactomePA, KEGGREST, Metacyc tools looks good but don't think they have command line option?). However, KEGGREST and reactomePA require specific accessions as input (usually an Entrez Gene ID), and the only accessions present in my PGAAP-annotated file are refseq IDs (and a few SwissProt IDs). I've used several programs (e.g., MyGene.Info in Bioconductor) to convert the refseq IDs to Gene IDs, and have found that most of the refseq IDs do not map to any Gene IDs. So, how can I get pathway information for these sequences?

    Thanks!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    There is an interesting file at NCBI that provides cross-mapping of various ID's: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz

    Take a look to see if that has mappings you can use. BTW: What exactly do you mean by Gene ID's?

    Comment

    • carp
      Junior Member
      • Jul 2015
      • 5

      #3
      Hi,

      Thanks for the reply. I had actually looked at the gene2refseq file from the same ftp site earlier. I searched it for a couple of the refseq IDs of interest and did not find them in the file. I just tried and got the same result for the gene2accession file. Entrez Gene IDs are IDs associated with the Genbank Gene database. I'm not exactly sure how the db is created, but it's highly curated in some way, and so I think the issue here is that there simply are not matches for every refseq ID in this database. I also thought about blasting my CDSs against the Gene DB to get a Gene ID where possible, but it seems like I should somehow be able to get pathway information from my NCBI annotated file?

      Thanks,
      Cary

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        gene2accession file is regenerated each day and should cover all the data in genbank. Can you share the accession numbers you are looking at?

        Comment

        • sadiexiaoyu
          Member
          • Apr 2013
          • 57

          #5
          Dear All,

          The information here sounds interesting. I also would like to do some investigations in pathway in different species. I have some candidate genes and also their corresponding gene sequences in different species, and I heard that people can use blast2go to get the GO information of the candidate genes, and then build the pathway by using KEGG which have already collecting the available information about which gene is involved in which pathway. So is that mean the pathway comparison among the candidate genes can be realized in blast2go with KEGG database? Or is there any other software can do this?

          Also, I would like to ask is the KEGG the best database for doing this? Is there any other database also include broad information which include Gene ontology, functional experiment results etc.?

          Thanks in advance!

          Best,

          Sadiexiaoyu
          Last edited by sadiexiaoyu; 07-07-2015, 01:10 AM.

          Comment

          • carp
            Junior Member
            • Jul 2015
            • 5

            #6
            Hi,

            Here are 2 of the accession numbers I looked for using grep:
            WP_014091756.1
            WP_010958694.1
            That would be great if you could double check me - maybe I am missing something here.

            Carp

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Those accession numbers are referring to "RefSeq non-redundant proteins" which is a new record type introduced in 2013 (http://www.ncbi.nlm.nih.gov/refseq/a...ndantproteins/). These records don't point to a specific gene but the closest you are going to get is the protein clusters record.

              Only way I see of being able to pull information for those WP_* ID's is by using the blastdbcmd utility and nr blast database (adjust outfmt appropriately).

              Code:
              $ blastdbcmd -entry WP_014091756.1 -db /path_to/nr -outfmt '%a,%t'
              There are a couple of other free options for KEGG in this thread: http://seqanswers.com/forums/showthread.php?p=158023
              Last edited by GenoMax; 07-07-2015, 09:24 AM.

              Comment

              • carp
                Junior Member
                • Jul 2015
                • 5

                #8
                Hi GenoMax,

                Thanks very much for the info. Helpful. I haven't used blastdbcmd before, and was just reading about it in the user manual. Could you explain a little bit about the output you might expect from this search?

                Thanks,
                Carp

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Code:
                  $ blastdbcmd -entry WP_014091756.1 -db /real_path_to/nr -outfmt '%a,%t'
                  Gives you:

                  WP_014091756.1,hypothetical protein[Listeria ivanovii]
                  CBW84678.1,Putative transcription repressor of class III stress genes (CtsR)[Listeria ivanovii subsp. ivanovii PAM 55]
                  AHI54813.1,CtsR family transcriptional regulator[Listeria ivanovii WSLC3009]
                  AIS64276.1,CtsR family transcriptional regulator[Listeria ivanovii subsp. ivanovii]
                  Not something you can use directly (at least that is what I am guessing) but it at least tells you that this is CtsR gene.

                  Comment

                  • carp
                    Junior Member
                    • Jul 2015
                    • 5

                    #10
                    Thank you so much, that helps!

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      07-01-2026, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 07-02-2026, 11:08 AM
                    0 responses
                    16 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    17 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    54 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...