Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sphil
    Senior Member
    • Apr 2010
    • 192

    Gene names to GO terms

    Hey guys,

    i'm looking for a script which can somekind crawl a certain online database for gene names and fetch the GO terms, functionality (if known) from the database?

    Is there any tool that is able to map gene names to go terms?


    Best,

    Phil
  • dariober
    Senior Member
    • May 2010
    • 311

    #2
    Hi- One option is to use Ensembl/Biomart (http://www.ensembl.org/biomart/martview/) which has a Bioconductor package to query it. If you are happy with ensembl and R:

    Code:
    library(biomaRt)
    ensembl<- useMart("ensembl",dataset="hsapiens_gene_ensembl")
    getBM(attributes=c('hgnc_symbol', 'go_id', 'name_1006'), filters = 'hgnc_symbol', values= c('ACTB', 'TNF'), mart= ensembl)
    Which produce a dataframe like:

    Code:
        hgnc_symbol      go_id                                                            name_1006
    1           TNF GO:0007275                                 multicellular organismal development
    2           TNF GO:0006915                                                    apoptotic process
    3           TNF GO:0000122 negative regulation of transcription from RNA polymerase II promoter
    4           TNF GO:0008285                            negative regulation of cell proliferation
    ...
    Hope this helps!

    Dario

    Comment

    • sphil
      Senior Member
      • Apr 2010
      • 192

      #3
      perfect, thanks!

      Comment

      • carolW
        Senior Member
        • Apr 2013
        • 103

        #4
        am new to GO business. Have noticed that for a given gene, many GO terms can be retrieved. is it possible to know to which category a GO term belong and display it, for ex biological process, molecular function or cellular component? Are there any other categories that could be interesting and GO terms could be retrieved?

        Do the people usually extract all GO terms pertaining to a given gene? If not how do they filter? All advices, info are welcome.

        Look forward to your reply,

        Comment

        • blancha
          Senior Member
          • May 2013
          • 367

          #5
          The Gene Ontology Project covers 3 domains: cellular component, molecular function and biological project
          (Wikipedia)

          "cellular component, the parts of a cell or its extracellular environment;
          molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis;
          biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms."
          (Wikipedia)

          With Ensembl's bioMart, or any other gene annotation tool, you can specify the domain of interest.
          With Ensembl, you actually just ask for the domain for each term, then apply your own filtering.
          The domain you pick really depends on your subject of interest.

          I just find it preferable not to mix the terms from the 3 domains together. Just present the terms from the one domain you are interested in, or present separately the terms for each domain.

          Comment

          • carolW
            Senior Member
            • Apr 2013
            • 103

            #6
            How is it possible to specify the domain using biomaRt, bioconductor package? specifically, which parameters of getBM?

            I saw that they have > 1000 attributes and 287 filters in biomaRt package. how to choose based on subject of interest?

            in parallel, I had a look at biomart on ensembl web site and could find the domain in a field but didn't find any field how to upload my gene list. but finding attributes and filters seem to be quicker. don't know which one could be preferable, biomaRt package or web page?
            Last edited by carolW; 12-12-2014, 12:09 PM.

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Have a look at http://geneontology.org for additional resources/tools.

              Comment

              • blancha
                Senior Member
                • May 2013
                • 367

                #8
                I've posted the R code.
                The attribute for the domain is namespace_1003 while the attribute for the term is name_1006.
                You can easily recover the attributes description with the function listAttributes().

                The website may be easier for a neophyte.
                You give your list of genes (Max. 500) in the Filter section.

                It's rarely interesting to recover the gene ontology term for individual genes anyway. More often, you'll want the enriched GO terms for a given list of genes (e.g. differentially expressed) vs the background. You can use DAVID (or Gorilla) for that. DAVID is extremely easy to use, quick and reliable.

                Code:
                library("biomaRt")
                
                # Connect to biomaRt ensembl
                mart <- useMart("ensembl")
                mart <- useData("hsapiens_ensembl", mart=mart)
                
                attributes = listAttributes(mart)
                write.table(attributes, "attributes.txt", sep="\t", row.names=FALSE, quote=FALSE)
                
                # Recover gene terms and domain
                gene.terms <- getBM(filter="ensembl_gene_id", value="ENSG00000143632", attribute=c("ensembl_gene_id", "external_gene_name", "name_1006", "namespace_1003"), mart=mart)
                gene.terms <- subset(gene.terms, name_1006 != "")
                write.table(gene.terms, "gene_terms.txt", sep="\t", row.names=FALSE, quote=FALSE)

                Comment

                • carolW
                  Senior Member
                  • Apr 2013
                  • 103

                  #9
                  Regarding David, should all clusters be considered or based on some criteria such as enrichment score and how?

                  In general, except GO enrichment, are there other ways to narrow down the list of GO terms?
                  Last edited by carolW; 12-12-2014, 01:18 PM.

                  Comment

                  • carolW
                    Senior Member
                    • Apr 2013
                    • 103

                    #10
                    1-
                    I get an err msg when using useData

                    mart <- useData("hsapiens_ensembl", mart=mart)
                    Error: could not find function "useData"

                    and if it should have been useDataset

                    mart <- useDataset("hsapiens_ensembl", mart=mart)
                    Error in useDataset("hsapiens_ensembl", mart = mart) :
                    No valid Mart object given, specify a Mart object with the attribute mart

                    What is the correct function to use?

                    2- Moreover,I would like to make a list or graphics of the most frequent GO terms. So I used david and got a list of GO terms with their number of occurence obtained from the functional annotation chart file. However, some of the genes are just associated to a term which is not a GO term and the number of frequency of others are so close to each other that it's useless to make a graphics

                    phosphoprotein 232
                    acetylation 185
                    nucleus 134
                    cytoplasm 120
                    GO:0031974~membrane-enclosed lumen 99
                    GO:0043228~non-membrane-bounded organelle 98
                    GO:0043232~intracellular non-membrane-bounded organelle 98

                    How to identify the GO terms with the highest frequency and is it better to use the functional annotation clustering file? In this case, which GO terms and clusters to use?

                    Look forward to your reply,
                    Last edited by carolW; 12-28-2014, 07:05 AM. Reason: ask other questions related to the reply

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Pathogen Surveillance with Advanced Genomic Tools
                      by seqadmin




                      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                      03-24-2025, 11:48 AM
                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    41 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    36 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    192 views
                    0 reactions
                    Last Post seqadmin  
                    Working...