RNA-Seq Pathway and Gene-set Analysis Workflows in R/Bioconductor with GAGE/Pathview

bigmw replied

07-16-2014, 09:26 AM
eg2id and id2eg are the pair of functions for ID mapping from and to Entrez Genes. For info:
?eg2id

Originally posted by crazyhottommy View Post

Also, if I do want to convert gene set ID from Entrez to symbol.
How can I do it?

Thank you.
Leave a comment:
bigmw replied

07-16-2014, 09:24 AM
GAGE and other methods (GSEA etc) require all genes included. This way GAGE test gene perturbations within pathways against the background of all genes. You selected a list of differentially expressed gene first, it is expected that you don’t any pathways standing out in that perforeground, right? Including all genes instead of a selected list of genes give you a major advantages: you included all your data in the analysis, which is usually more powerful. In addition, you don’t need some more or less arbitrary q-/p-value cutoff.

Otherwise, you code seem to work fine. You may want to check the DESeq section and the native workflow on the demo code:

http://www.bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/RNA-seqWorkflow.pdf

Originally posted by crazyhottommy View Post

Thank you, I followed it, after DESeq. 1724 differentially expressed genes were used for pathway analysis.

res <- nbinomTest( cds, 'control, 'treat' )

resSig <- res[ res$padj < 0.01 & (res$log2FoldChange >1| res$log2FoldChange < -1), ]

resSig <- na.omit(resSig)

require(gage)
...

Am I doing it right?
Leave a comment:
crazyhottommy replied

07-15-2014, 06:48 PM
Originally posted by bigmw View Post

The pathview package provides two functions: eg2id and id2eg, for ID mapping/conversion for major research species. For details:
?pathview::eg2id

BTW, I would suggest you to convert your data ID from symbol to Entrez Gene, rather than your gene set ID from Entrez to symbol. The former should be much faster as it only need to call the conversion function once.

Also, if I do want to convert gene set ID from Entrez to symbol.
How can I do it?

Thank you.
Leave a comment:
crazyhottommy replied

07-15-2014, 06:46 PM
Originally posted by bigmw View Post

BTW, has a separate tutorial on data preparation, you can check Section 5 -- gene or transcript ID conversion:
http://www.bioconductor.org/packages...c/dataPrep.pdf

Thank you, I followed it, after DESeq. 1724 differentially expressed genes were used for pathway analysis.

res <- nbinomTest( cds, 'control, 'treat' )

resSig <- res[ res$padj < 0.01 & (res$log2FoldChange >1| res$log2FoldChange < -1), ]

resSig <- na.omit(resSig)

require(gage)
datakegg.gs)
deseq.fc<- resSig$log2FoldChange
names(deseq.fc)<- resSig$id
sum(is.infinite(deseq.fc)) # there are some infinite numbers, if use DESeq2, no such problem.
deseq.fc[deseq.fc>10]=10
deseq.fc[deseq.fc<-10]=-10
exp.fc<- deseq.fc

#kegg.gsets works with 3000 KEGG speicies
data(korg)
head(korg[,1:3], n=20)

#let's get the annotation files for mouse and convert the gene set to gene symbol format
kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]
lapplykegg.gs[1:3],head)

# to convert IDs among gene/transcript ID to Entrez GeneID or reverse, use eg2id and id2eg in the pathview package
library(pathview)
data(bods)
bods

gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')
# convert the gene symbol to Entrez Gene ID
head(gene.symbol.eg, n=100)
head(gene.symbol.eg[,2], n=10)

names(exp.fc)<- gene.symbol.eg[,2]

fc.kegg.p<- gage(exp.fc, gsets= kegg.gs, ref=NULL, samp=NULL)
sel<- fc.kegg.p$greater[,"q.val"] < 0.1 & !is.na(fc.kegg.p$greater[,"q.val"])
table(sel)

sel.l<- fc.kegg.p$less[,"q.val"] < 0.1 & !is.na(fc.kegg.p$greater[,"q.val"])
table(sel.l)

> table(sel.l)
sel.l
FALSE
202

> table(sel)
sel
FALSE
202

Am I doing it right?
Leave a comment:
bigmw replied

07-15-2014, 05:09 PM
BTW, has a separate tutorial on data preparation, you can check Section 5 -- gene or transcript ID conversion:

http://www.bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/dataPrep.pdf
Leave a comment:
bigmw replied

07-15-2014, 12:49 PM
The pathview package provides two functions: eg2id and id2eg, for ID mapping/conversion for major research species. For details:
?pathview::eg2id

BTW, I would suggest you to convert your data ID from symbol to Entrez Gene, rather than your gene set ID from Entrez to symbol. The former should be much faster as it only need to call the conversion function once.
Leave a comment:
crazyhottommy replied

07-14-2014, 12:56 PM
Hi there, thank you for making this awesome tool.

I am working with mouse data, I want to know how to convert the gene set into gene symbol format.

kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]
lapply(kegg.gs[1:3],head)

the eg2sym function is only for human data. I can not do things below:

data(egSymb)
kegg.gs.sym<- lapply(kegg.gs, eg2sym)

Thank you!
Tommy
Leave a comment:
tigerxu replied

07-11-2014, 12:26 PM
I have followed the default workflows of gage and pathview on the example RNA-seq dataset. I also used the fold changes inferred by deseq2, then followed by the gage and pathview. I found both pipelines will output different results. The pipeline based on the fold changes by deseq2 generate much fewer significant pathways. For example below

> gage.kegg.sig<-sigGeneSet(gage.kegg.p, outname="sig.kegg",pdf.size=c(7,8))
[1] "there are 22 signficantly up-regulated gene sets"
[1] "there are 17 signficantly down-regulated gene sets"

> deseq2.kegg.sig<-sigGeneSet(deseq2.kegg.p, outname="deseq2.sig.kegg",pdf.size=c(7,8))
[1] "gs.data needs to be a matrix-like object!"
[1] "No heatmap produced for down-regulated gene sets, only 1 or none signficant."
[1] "gs.data needs to be a matrix-like object!"
[1] "there are 7 signficantly up-regulated gene sets"
[1] "there are 0 signficantly down-regulated gene sets"

I'm wondering which pipeline is more reliable for biological interpretation. Why the pipeline based on deseq2 return much fewer pathways? Can anyone give me some advice?

Thanks!

Last edited by tigerxu; 07-11-2014, 12:29 PM.
Leave a comment:
tigerxu replied

07-10-2014, 06:26 AM
Originally posted by bigmw View Post

Just checked the source code for sigGeneSet and internal functions gs.heatmap. there seems to be a potential conflict in argument margins indeed. Will have the problem fixed. you can check the updated version 2.14.3 in the next couple of days here:
http://www.bioconductor.org/packages...html/gage.html

Okay, thank! I will try version 2.14.3 later.
Leave a comment:
bigmw replied

07-10-2014, 06:05 AM
Just checked the source code for sigGeneSet and internal functions gs.heatmap. there seems to be a potential conflict in argument margins indeed. Will have the problem fixed. you can check the updated version 2.14.3 in the next couple of days here:

gage

http://www.bioconductor.org/packages/release/bioc/html/gage.html

GAGE is a published method for gene set (enrichment or GSEA) or pathway analysis. GAGE is generally applicable independent of microarray or RNA-Seq data attributes including sample sizes, experimental designs, assay platforms, and other types of heterogeneity, and consistently achieves superior performance over other frequently used methods. In gage package, we provide functions for basic GAGE analysis, result processing and presentation. We have also built pipeline routines for of multiple GAGE analyses in a batch, comparison between parallel analyses, and combined analysis of heterogeneous data from different sources/studies. In addition, we provide demo microarray data and commonly used gene set data based on KEGG pathways and GO terms. These funtions and data are also useful for gene set analysis using other methods.
Leave a comment:
tigerxu replied

07-09-2014, 10:39 AM
Originally posted by bigmw View Post

This is the latest version. Do you still get the problem?

The problem is still there. But I have modified the margins parameters in the internal function sigGeneSet within the gage package. It can work!
Leave a comment:
bigmw replied

07-09-2014, 09:45 AM
This is the latest version. Do you still get the problem?
Leave a comment:
tigerxu replied

07-09-2014, 03:06 AM
Originally posted by bigmw View Post

You may want to check the version of the gage package you are running, which can be seen by:
sessionInfo()

other attached packages:
[1] gage_2.14.2 GenomicAlignments_1.0.2
[3] BSgenome_1.32.0 Rsamtools_1.16.1
[5] Biostrings_2.32.0 XVector_0.4.0
[7] DESeq2_1.4.5 RcppArmadillo_0.4.300.8.0
[9] Rcpp_0.11.2 GenomicRanges_1.16.3
[11] GenomeInfoDb_1.0.2 IRanges_1.22.9
[13] BiocGenerics_0.10.0

Is the version of gage not proper?
Leave a comment:
bigmw replied

07-07-2014, 08:54 AM
You may want to check the version of the gage package you are running, which can be seen by:
sessionInfo()
Leave a comment:
tigerxu replied

07-07-2014, 03:13 AM
Originally posted by bigmw View Post

Forgot that sigGeneSet function has been updated to give users more control on the margin and font size. sigGeneSet calls a internal function heatmap2 to generate the heatmaps. So check the argument for this function
args(gage:::heatmap2)
The argument two relevant arguments here are margins and cexRow, which control the margins for column/row names and row name font size, you may do something like:
kegg.sig<-sigGeneSet(cnts.kegg.p,outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12), margins = c(5,10))

I have a question about the margin argument in the sigGeneSet function when I run the following command
> rcount.kegg.sig<-sigGeneSet(rcount.kegg.p, outname="sig.kegg",pdf.size=c(7,12),margins=c(5, 10))
Error in heatmap2(gs.data, Colv = F, Rowv = F, dendrogram = "none", col = cols, :
formal argument "margins" matched by multiple actual arguments

Can anyone help me?

Thanks!
Leave a comment:

Previous 1 2 3 4 5 7 template Next

Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM
Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM

Topics	Statistics	Last Post
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 194 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM
Catalog of Gene-Isoform Variation in Developing Human Brain by seqadmin Started by seqadmin, 05-23-2024, 10:28 AM	0 responses 212 views 0 likes	Last Post by seqadmin 05-23-2024, 10:28 AM
Ancient Viral Sequences in Human Brain Linked to Psychiatric Disorders by seqadmin Started by seqadmin, 05-23-2024, 07:35 AM	0 responses 214 views 0 likes	Last Post by seqadmin 05-23-2024, 07:35 AM
New Milestone for COSMIC with Extensive Cancer Mutation Data by seqadmin Started by seqadmin, 05-22-2024, 02:06 PM	0 responses 12 views 0 likes	Last Post by seqadmin 05-22-2024, 02:06 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News