Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bigmw
    replied
    eg2id and id2eg are the pair of functions for ID mapping from and to Entrez Genes. For info:
    ?eg2id


    Originally posted by crazyhottommy View Post
    Also, if I do want to convert gene set ID from Entrez to symbol.
    How can I do it?

    Thank you.

    Leave a comment:


  • bigmw
    replied
    GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you don’t any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you don’t need some more or less arbitrary q-/p-value cutoff.

    Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:



    Originally posted by crazyhottommy View Post
    Thank you, I followed it, after DESeq. 1724 differentially expressed genes were used for pathway analysis.

    res <- nbinomTest( cds, 'control, 'treat' )

    resSig <- res[ res$padj < 0.01 & (res$log2FoldChange >1| res$log2FoldChange < -1), ]

    resSig <- na.omit(resSig)

    require(gage)
    ...

    Am I doing it right?

    Leave a comment:


  • crazyhottommy
    replied
    Originally posted by bigmw View Post
    The pathview package provides two functions: eg2id and id2eg, for ID mapping/conversion for major research species. For details:
    ?pathview::eg2id

    BTW, I would suggest you to convert your data ID from symbol to Entrez Gene, rather than your gene set ID from Entrez to symbol. The former should be much faster as it only need to call the conversion function once.
    Also, if I do want to convert gene set ID from Entrez to symbol.
    How can I do it?

    Thank you.

    Leave a comment:


  • crazyhottommy
    replied
    Originally posted by bigmw View Post
    BTW, has a separate tutorial on data preparation, you can check Section 5 -- gene or transcript ID conversion:
    http://www.bioconductor.org/packages...c/dataPrep.pdf
    Thank you, I followed it, after DESeq. 1724 differentially expressed genes were used for pathway analysis.

    res <- nbinomTest( cds, 'control, 'treat' )

    resSig <- res[ res$padj < 0.01 & (res$log2FoldChange >1| res$log2FoldChange < -1), ]

    resSig <- na.omit(resSig)

    require(gage)
    datakegg.gs)
    deseq.fc<- resSig$log2FoldChange
    names(deseq.fc)<- resSig$id
    sum(is.infinite(deseq.fc)) # there are some infinite numbers, if use DESeq2, no such problem.
    deseq.fc[deseq.fc>10]=10
    deseq.fc[deseq.fc<-10]=-10
    exp.fc<- deseq.fc

    #kegg.gsets works with 3000 KEGG speicies
    data(korg)
    head(korg[,1:3], n=20)


    #let's get the annotation files for mouse and convert the gene set to gene symbol format
    kg.mouse<- kegg.gsets("mouse")
    kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]
    lapplykegg.gs[1:3],head)



    # to convert IDs among gene/transcript ID to Entrez GeneID or reverse, use eg2id and id2eg in the pathview package
    library(pathview)
    data(bods)
    bods

    gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')
    # convert the gene symbol to Entrez Gene ID
    head(gene.symbol.eg, n=100)
    head(gene.symbol.eg[,2], n=10)

    names(exp.fc)<- gene.symbol.eg[,2]

    fc.kegg.p<- gage(exp.fc, gsets= kegg.gs, ref=NULL, samp=NULL)
    sel<- fc.kegg.p$greater[,"q.val"] < 0.1 & !is.na(fc.kegg.p$greater[,"q.val"])
    table(sel)

    sel.l<- fc.kegg.p$less[,"q.val"] < 0.1 & !is.na(fc.kegg.p$greater[,"q.val"])
    table(sel.l)



    > table(sel.l)
    sel.l
    FALSE
    202

    > table(sel)
    sel
    FALSE
    202

    Am I doing it right?

    Leave a comment:


  • bigmw
    replied
    BTW, has a separate tutorial on data preparation, you can check Section 5 -- gene or transcript ID conversion:

    Leave a comment:


  • bigmw
    replied
    The pathview package provides two functions: eg2id and id2eg, for ID mapping/conversion for major research species. For details:
    ?pathview::eg2id

    BTW, I would suggest you to convert your data ID from symbol to Entrez Gene, rather than your gene set ID from Entrez to symbol. The former should be much faster as it only need to call the conversion function once.

    Leave a comment:


  • crazyhottommy
    replied
    Hi there, thank you for making this awesome tool.

    I am working with mouse data, I want to know how to convert the gene set into gene symbol format.

    kg.mouse<- kegg.gsets("mouse")
    kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]
    lapply(kegg.gs[1:3],head)


    the eg2sym function is only for human data. I can not do things below:

    data(egSymb)
    kegg.gs.sym<- lapply(kegg.gs, eg2sym)

    Thank you!
    Tommy

    Leave a comment:


  • tigerxu
    replied
    I have followed the default workflows of gage and pathview on the example RNA-seq dataset. I also used the fold changes inferred by deseq2, then followed by the gage and pathview. I found both pipelines will output different results. The pipeline based on the fold changes by deseq2 generate much fewer significant pathways. For example below

    > gage.kegg.sig<-sigGeneSet(gage.kegg.p, outname="sig.kegg",pdf.size=c(7,8))
    [1] "there are 22 signficantly up-regulated gene sets"
    [1] "there are 17 signficantly down-regulated gene sets"

    > deseq2.kegg.sig<-sigGeneSet(deseq2.kegg.p, outname="deseq2.sig.kegg",pdf.size=c(7,8))
    [1] "gs.data needs to be a matrix-like object!"
    [1] "No heatmap produced for down-regulated gene sets, only 1 or none signficant."
    [1] "gs.data needs to be a matrix-like object!"
    [1] "there are 7 signficantly up-regulated gene sets"
    [1] "there are 0 signficantly down-regulated gene sets"

    I'm wondering which pipeline is more reliable for biological interpretation. Why the pipeline based on deseq2 return much fewer pathways? Can anyone give me some advice?

    Thanks!
    Last edited by tigerxu; 07-11-2014, 12:29 PM.

    Leave a comment:


  • tigerxu
    replied
    Originally posted by bigmw View Post
    Just checked the source code for sigGeneSet and internal functions gs.heatmap. there seems to be a potential conflict in argument margins indeed. Will have the problem fixed. you can check the updated version 2.14.3 in the next couple of days here:
    http://www.bioconductor.org/packages...html/gage.html
    Okay, thank! I will try version 2.14.3 later.

    Leave a comment:


  • bigmw
    replied
    Just checked the source code for sigGeneSet and internal functions gs.heatmap. there seems to be a potential conflict in argument margins indeed. Will have the problem fixed. you can check the updated version 2.14.3 in the next couple of days here:
    GAGE is a published method for gene set (enrichment or GSEA) or pathway analysis. GAGE is generally applicable independent of microarray or RNA-Seq data attributes including sample sizes, experimental designs, assay platforms, and other types of heterogeneity, and consistently achieves superior performance over other frequently used methods. In gage package, we provide functions for basic GAGE analysis, result processing and presentation. We have also built pipeline routines for of multiple GAGE analyses in a batch, comparison between parallel analyses, and combined analysis of heterogeneous data from different sources/studies. In addition, we provide demo microarray data and commonly used gene set data based on KEGG pathways and GO terms. These funtions and data are also useful for gene set analysis using other methods.

    Leave a comment:


  • tigerxu
    replied
    Originally posted by bigmw View Post
    This is the latest version. Do you still get the problem?
    The problem is still there. But I have modified the margins parameters in the internal function sigGeneSet within the gage package. It can work!

    Leave a comment:


  • bigmw
    replied
    This is the latest version. Do you still get the problem?

    Leave a comment:


  • tigerxu
    replied
    Originally posted by bigmw View Post
    You may want to check the version of the gage package you are running, which can be seen by:
    sessionInfo()
    other attached packages:
    [1] gage_2.14.2 GenomicAlignments_1.0.2
    [3] BSgenome_1.32.0 Rsamtools_1.16.1
    [5] Biostrings_2.32.0 XVector_0.4.0
    [7] DESeq2_1.4.5 RcppArmadillo_0.4.300.8.0
    [9] Rcpp_0.11.2 GenomicRanges_1.16.3
    [11] GenomeInfoDb_1.0.2 IRanges_1.22.9
    [13] BiocGenerics_0.10.0

    Is the version of gage not proper?

    Leave a comment:


  • bigmw
    replied
    You may want to check the version of the gage package you are running, which can be seen by:
    sessionInfo()

    Leave a comment:


  • tigerxu
    replied
    Originally posted by bigmw View Post
    Forgot that sigGeneSet function has been updated to give users more control on the margin and font size. sigGeneSet calls a internal function heatmap2 to generate the heatmaps. So check the argument for this function
    args(gage:::heatmap2)
    The argument two relevant arguments here are margins and cexRow, which control the margins for column/row names and row name font size, you may do something like:
    kegg.sig<-sigGeneSet(cnts.kegg.p,outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12), margins = c(5,10))
    I have a question about the margin argument in the sigGeneSet function when I run the following command
    > rcount.kegg.sig<-sigGeneSet(rcount.kegg.p, outname="sig.kegg",pdf.size=c(7,12),margins=c(5, 10))
    Error in heatmap2(gs.data, Colv = F, Rowv = F, dendrogram = "none", col = cols, :
    formal argument "margins" matched by multiple actual arguments

    Can anyone help me?

    Thanks!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM
  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 05-24-2024, 07:15 AM
0 responses
194 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-23-2024, 10:28 AM
0 responses
212 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-23-2024, 07:35 AM
0 responses
214 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-22-2024, 02:06 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Working...
X