RNA-Seq Pathway and Gene-set Analysis Workflows in R/Bioconductor with GAGE/Pathview

bigmw replied

03-17-2014, 07:09 PM
Your kegg.gs has no problem, and it is the same for both datasets:
> lapply(kegg.gs[1:3], head, 3)
$`ssc00970 Aminoacyl-tRNA biosynthesis`
[1] "100144457" "100153423" "100155115"
…

The problem resides in yoru expression data. In your problematic dataset (let’s call it dataset 1), you have only 7434 entries (not exactly genes):
> str(cuff.fc)
Named num [1:7434] 0.6134 0 0.3767 0.0177 -2.1022 ...
…

In your good dataset 2, you have 344025 entries:
> str(cuff.fc)
Named num [1:344025] 1.279 0 0.155 0.806 -0.958 ...
…

Pig has 57700 genes in total based on NCBI data. Assume your dataset 2 has a full or decent coverage on pig genes, each pig gene maps to about 6 entries in this dataset. Assume the same level of redundancy, there are only ~ 1200 unique genes in your dataset 1. In other words, this is likely a sublist of selected or significant genes. Pathway or gene set analysis like GAGE requires a comprehensive and balance coverage of genes for your research speices, e.g. the full list of genes measured in a microarray or RNA-seq study, usually in the order of several thousand or tens of thousands of genes.
Therefore, very likely you gage analysis of dataset 1 doesn’t yield any significant pathways due to the biased preslected/short gene list with only ~ 1200 genes. Hence you get NA pathway IDs when you select significant pathways. Your dataset 2 does give you some sensible results because you seem to have a good and balanced coverage of pig genes there.

However, you should really make sure each gene ID has only 1 expression value when doing gene set or pathway analysis because genes are treated as independent variables. In the GAGE/Pathview native workflow or joint workflow with DESeq/DESeq2/edgeR/limma, we explicitly mapped reads to Entrez Genes and summarized expression counts per genes. But your data were mapped and processed by Cufflinks likely to some transcript sequences like RefSeq mRNA’s (with Entrez Gene annotation). You should summarize the expression data for multiple transcripts of the same gene into 1 unified expression level, and then conduct pathway analysis. We describe how to do so in a secondary gage tutorial, “Gene set and data preparation”, please check Section 5-gene or transcript ID conversion:

http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/dataPrep.pdf

Originally posted by shocker8786 View Post

I retried the code and was still not successful with one of my datasets. I am sure that the code I am using is the same for both datasets (I have been copying and pasting the code for both datasets). Below is the code and the results for the commands you asked me to enter for the dataset that did not work properly.

> cuff.res=read.delim(file="gene_exp.diff", sep="\t")
> cuff.fc=cuff.res$log2.fold_change.
> gnames=cuff.res$gene
> sel=gnames!="-"
> gnames=as.character(gnames[sel])
> cuff.fc=cuff.fc[sel]
> names(cuff.fc)=gnames
> gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.Ss.eg.db

> sel2=gnames.eg[,2]>""
> cuff.fc=cuff.fc[sel2]
> names(cuff.fc)=gnames.eg[sel2,2]
> range(cuff.fc)
[1] -Inf Inf
> cuff.fc[cuff.fc>10]=10
> cuff.fc[cuff.fc< -10]=-10
> exp.fc=cuff.fc
> out.suffix="cuff"
> require(gage)
> kg.ssc=kegg.gsets("ssc")
> kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[, "q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
> require(pathview)
> pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
+ gene.data = exp.fc, pathway.id = pid,
+ species = "ssc", out.suffix=out.suffix))
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
> str(cuff.fc)
Named num [1:7434] 0.6134 0 0.3767 0.0177 -2.1022 ...
- attr(*, "names")= chr [1:7434] "100620481" "100512748" "100513321" "100154202" ...
> head(cuff.fc)
100620481 100512748 100513321 100154202 733660 100153485
0.6133590 0.0000000 0.3766890 0.0177489 -2.1022400 -0.2612780
> lapply(kegg.gs[1:3], head, 3)
$`ssc00970 Aminoacyl-tRNA biosynthesis`
[1] "100144457" "100153423" "100155115"

$`ssc02010 ABC transporters`
[1] "100048963" "100127449" "100144586"

$`ssc03008 Ribosome biogenesis in eukaryotes`
[1] "100151786" "100151831" "100152658"

> head(fc.kegg.p$greater)
p.geomean stat.mean p.val
ssc04064 NF-kappa B signaling pathway 0.02469658 1.990928 0.02469658
ssc04620 Toll-like receptor signaling pathway 0.03526210 1.828538 0.03526210
ssc04066 HIF-1 signaling pathway 0.04658115 1.693048 0.04658115
ssc04640 Hematopoietic cell lineage 0.04929811 1.670237 0.04929811
ssc04670 Leukocyte transendothelial migration 0.04995310 1.657766 0.04995310
ssc04610 Complement and coagulation cascades 0.07305357 1.467898 0.07305357
q.val set.size exp1
ssc04064 NF-kappa B signaling pathway 0.8234987 53 0.02469658
ssc04620 Toll-like receptor signaling pathway 0.8234987 58 0.03526210
ssc04066 HIF-1 signaling pathway 0.8234987 59 0.04658115
ssc04640 Hematopoietic cell lineage 0.8234987 49 0.04929811
ssc04670 Leukocyte transendothelial migration 0.8234987 63 0.04995310
ssc04610 Complement and coagulation cascades 0.8234987 44 0.07305357

Below are the results for the dataset that did work properly:

Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04151.cuff.png
Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04010.cuff.png
Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04080.cuff.png
> str(cuff.fc)
Named num [1:344025] 1.279 0 0.155 0.806 -0.958 ...
- attr(*, "names")= chr [1:344025] "100620481" "100512748" "100513321" "100154202" ...
> head(cuff.fc)
100620481 100512748 100513321 100154202 100156632 733660
1.278660 0.000000 0.154690 0.806045 -0.958121 -0.103950
> lapply(kegg.gs[1:3], head, 3)
$`ssc00970 Aminoacyl-tRNA biosynthesis`
[1] "100144457" "100153423" "100155115"

$`ssc02010 ABC transporters`
[1] "100048963" "100127449" "100144586"

$`ssc03008 Ribosome biogenesis in eukaryotes`
[1] "100151786" "100151831" "100152658"

> head(fc.kegg.p$greater)
p.geomean stat.mean p.val
ssc00770 Pantothenate and CoA biosynthesis 0.9995680 -3.988628 0.9995680
ssc00630 Glyoxylate and dicarboxylate metabolism 0.9995962 -3.942065 0.9995962
ssc00052 Galactose metabolism 0.9999012 -4.281375 0.9999012
ssc04740 Olfactory transduction 0.9999254 -4.576407 0.9999254
ssc00100 Steroid biosynthesis 0.9999490 -4.852319 0.9999490
ssc00970 Aminoacyl-tRNA biosynthesis 0.9999580 -4.829070 0.9999580
q.val set.size exp1
ssc00770 Pantothenate and CoA biosynthesis 1 10 0.9995680
ssc00630 Glyoxylate and dicarboxylate metabolism 1 11 0.9995962
ssc00052 Galactose metabolism 1 15 0.9999012
ssc04740 Olfactory transduction 1 12 0.9999254
ssc00100 Steroid biosynthesis 1 11 0.9999490
ssc00970 Aminoacyl-tRNA biosynthesis 1 12 0.9999580
Leave a comment:
sindrle replied

03-17-2014, 03:04 PM
Couldnt get gnCnt to work, I read in HTseq files instead like this:

fls <- list.files(path="~/RNAseq/02_MG-myocytes/MG_Myocytes_HTSeq/A1A2pT2D",pattern="*.txt", full.names =T)

#tab separated values with a header
datalist = lapply(fls, function(x)read.table(x, header=F, colClasses=c("NULL", "numeric"), nrow = nrow(read.table(x, header = F)) - 5))

#same header/columns for all files
datafr = do.call("cbind", datalist)

#column names
colnames(datafr) <- c("D102-A1","D102-A2","D104-A1","D104-A2","D117-A1","D117-A2","D121-A1","D121-A2","D153-A1","D153-A2","D155-A1","D155-A2","D161-A1","D161-A2","D162-A1","D162-A2","D167-A1","D167-A2","D173-A1","D173-A2","D176-A1","D176-A2","D177-A1","D177-A2","D179-A1","D179-A2")

#row names
x <- read.table(file = "~/RNAseq/02_MG-myocytes/MG_Myocytes_HTSeq/A1A2pT2D/D102-A1.txt", header=F, colClasses=c( "character", "NULL"), nrow = nrow(read.table(file = "~/RNAseq/02_MG-myocytes/MG_Myocytes_HTSeq/A1A2pT2D/D102-A1.txt", header = F)) - 5)

rownames(datafr) <- x$V1

hnrnp.cnts <- datafr

Last edited by sindrle; 03-18-2014, 05:52 AM. Reason: Did an alternative route
Leave a comment:
shocker8786 replied

03-17-2014, 02:11 AM
I retried the code and was still not successful with one of my datasets. I am sure that the code I am using is the same for both datasets (I have been copying and pasting the code for both datasets). Below is the code and the results for the commands you asked me to enter for the dataset that did not work properly.

> cuff.res=read.delim(file="gene_exp.diff", sep="\t")
> cuff.fc=cuff.res$log2.fold_change.
> gnames=cuff.res$gene
> sel=gnames!="-"
> gnames=as.character(gnames[sel])
> cuff.fc=cuff.fc[sel]
> names(cuff.fc)=gnames
> gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.Ss.eg.db

> sel2=gnames.eg[,2]>""
> cuff.fc=cuff.fc[sel2]
> names(cuff.fc)=gnames.eg[sel2,2]
> range(cuff.fc)
[1] -Inf Inf
> cuff.fc[cuff.fc>10]=10
> cuff.fc[cuff.fc< -10]=-10
> exp.fc=cuff.fc
> out.suffix="cuff"
> require(gage)
> kg.ssc=kegg.gsets("ssc")
> kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[, "q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
> require(pathview)
> pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
+ gene.data = exp.fc, pathway.id = pid,
+ species = "ssc", out.suffix=out.suffix))
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
> str(cuff.fc)
Named num [1:7434] 0.6134 0 0.3767 0.0177 -2.1022 ...
- attr(*, "names")= chr [1:7434] "100620481" "100512748" "100513321" "100154202" ...
> head(cuff.fc)
100620481 100512748 100513321 100154202 733660 100153485
0.6133590 0.0000000 0.3766890 0.0177489 -2.1022400 -0.2612780
> lapply(kegg.gs[1:3], head, 3)
$`ssc00970 Aminoacyl-tRNA biosynthesis`
[1] "100144457" "100153423" "100155115"

$`ssc02010 ABC transporters`
[1] "100048963" "100127449" "100144586"

$`ssc03008 Ribosome biogenesis in eukaryotes`
[1] "100151786" "100151831" "100152658"

> head(fc.kegg.p$greater)
p.geomean stat.mean p.val
ssc04064 NF-kappa B signaling pathway 0.02469658 1.990928 0.02469658
ssc04620 Toll-like receptor signaling pathway 0.03526210 1.828538 0.03526210
ssc04066 HIF-1 signaling pathway 0.04658115 1.693048 0.04658115
ssc04640 Hematopoietic cell lineage 0.04929811 1.670237 0.04929811
ssc04670 Leukocyte transendothelial migration 0.04995310 1.657766 0.04995310
ssc04610 Complement and coagulation cascades 0.07305357 1.467898 0.07305357
q.val set.size exp1
ssc04064 NF-kappa B signaling pathway 0.8234987 53 0.02469658
ssc04620 Toll-like receptor signaling pathway 0.8234987 58 0.03526210
ssc04066 HIF-1 signaling pathway 0.8234987 59 0.04658115
ssc04640 Hematopoietic cell lineage 0.8234987 49 0.04929811
ssc04670 Leukocyte transendothelial migration 0.8234987 63 0.04995310
ssc04610 Complement and coagulation cascades 0.8234987 44 0.07305357

Below are the results for the dataset that did work properly:

Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04151.cuff.png
Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04010.cuff.png
Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04080.cuff.png
> str(cuff.fc)
Named num [1:344025] 1.279 0 0.155 0.806 -0.958 ...
- attr(*, "names")= chr [1:344025] "100620481" "100512748" "100513321" "100154202" ...
> head(cuff.fc)
100620481 100512748 100513321 100154202 100156632 733660
1.278660 0.000000 0.154690 0.806045 -0.958121 -0.103950
> lapply(kegg.gs[1:3], head, 3)
$`ssc00970 Aminoacyl-tRNA biosynthesis`
[1] "100144457" "100153423" "100155115"

$`ssc02010 ABC transporters`
[1] "100048963" "100127449" "100144586"

$`ssc03008 Ribosome biogenesis in eukaryotes`
[1] "100151786" "100151831" "100152658"

> head(fc.kegg.p$greater)
p.geomean stat.mean p.val
ssc00770 Pantothenate and CoA biosynthesis 0.9995680 -3.988628 0.9995680
ssc00630 Glyoxylate and dicarboxylate metabolism 0.9995962 -3.942065 0.9995962
ssc00052 Galactose metabolism 0.9999012 -4.281375 0.9999012
ssc04740 Olfactory transduction 0.9999254 -4.576407 0.9999254
ssc00100 Steroid biosynthesis 0.9999490 -4.852319 0.9999490
ssc00970 Aminoacyl-tRNA biosynthesis 0.9999580 -4.829070 0.9999580
q.val set.size exp1
ssc00770 Pantothenate and CoA biosynthesis 1 10 0.9995680
ssc00630 Glyoxylate and dicarboxylate metabolism 1 11 0.9995962
ssc00052 Galactose metabolism 1 15 0.9999012
ssc04740 Olfactory transduction 1 12 0.9999254
ssc00100 Steroid biosynthesis 1 11 0.9999490
ssc00970 Aminoacyl-tRNA biosynthesis 1 12 0.9999580
Leave a comment:
bigmw replied

03-15-2014, 06:08 PM
R does not remember the old kegg.gs, if you assign the name to a new gene set data. If you are sure your code was right, then very likely your data had some problem. To locate the problem, you can restart a new R session and run the same analysis code on your problematic data till the pathview step. If you still get the same error message, run the following code, and post your output here so that we can check what happened to your data:

str(cuff.fc)
head(cuff.fc)
lapply(kegg.gs[1:3], head, 3)
head(fc.kegg.p$greater)
Leave a comment:
shocker8786 replied

03-15-2014, 05:17 AM
Thank you for your help. I have 2 data sets that I am trying to perform this analysis on (all pig), and I was indeed able to produce the results files as expected by changing those lines of code for one of the data sets (the one I had not tried this on previously):

> kg.ssc=kegg.gsets("ssc")
> kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
> cuff.res=read.delim(file="gene_exp.diff", sep="\t")
> cuff.fc=cuff.res$log2.fold_change.
> gnames=cuff.res$gene
> sel=gnames!="-"
> gnames=as.character(gnames[sel])
> cuff.fc=cuff.fc[sel]
> names(cuff.fc)=gnames
> gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.Ss.eg.db

> sel2=gnames.eg[,2]>""
> cuff.fc=cuff.fc[sel2]
> names(cuff.fc)=gnames.eg[sel2,2]
> range(cuff.fc)
[1] -Inf Inf
> cuff.fc[cuff.fc>10]=10
> cuff.fc[cuff.fc< -10]=-10
> exp.fc=cuff.fc
> out.suffix="cuff"
> require(gage)
> kg.ssc=kegg.gsets("ssc")
> kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[, "q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
> require(pathview)
> pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
+ gene.data = exp.fc, pathway.id = pid,
+ species = "ssc", out.suffix=out.suffix))
Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04151.cuff.png
Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04010.cuff.png
Working in directory /srv/mds01/shared/Epigenetics/Rnaseq/kyle_results/TJT/tophat_expression/cuffdiff
Writing image file ssc04080.cuff.png

However, with my original data set, I still receive an error after entering the code exactly the same way:

> cuff.res=read.delim(file="gene_exp.diff", sep="\t")
> cuff.fc=cuff.res$log2.fold_change.
> gnames=cuff.res$gene
> sel=gnames!="-"
> gnames=as.character(gnames[sel])
> cuff.fc=cuff.fc[sel]
> names(cuff.fc)=gnames
> gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.Ss.eg.db

> sel2=gnames.eg[,2]>""
> cuff.fc=cuff.fc[sel2]
> names(cuff.fc)=gnames.eg[sel2,2]
> range(cuff.fc)
[1] -Inf Inf
> cuff.fc[cuff.fc>10]=10
> cuff.fc[cuff.fc< -10]=-10
> exp.fc=cuff.fc
> out.suffix="cuff"
> require(gage)
> kg.ssc=kegg.gsets("ssc")
> kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[, "q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
> require(pathview)
> pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
+ gene.data = exp.fc, pathway.id = pid,
+ species = "ssc", out.suffix=out.suffix))
[1] "Downloading xml files for sscNA, 1/1 pathways.."
[1] "Downloading png files for sscNA, 1/1 pathways.."
Download of sscNA xml and png files failed!
Failed to download KEGG xml/png files, sscNA skipped!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Warning message:
In download.file(png.url, png.target, quiet = T, mode = "wb") :
cannot open: HTTP status was '404 Not Found'

It seems that R is remembering that I was using the human kegg database before, and now I cannot get it to use the pig one. I deleted the sscNA.xml files and unlinked the previous workspace using the unlink(".RData") command, but am still unable to produce the result.
Leave a comment:
bigmw replied

03-14-2014, 06:07 PM
Your original code worked well in this step as show in your reponse #24 above.
> gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.Ss.eg.db

This line above had no problem. Changing org= "Ss" to "Ssc" and "ssc" caused all the errors. There is no way you will get the input of org="Ss" and message with org.ssc.eg.db like below:
>gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.ssc.eg.db

Originally posted by shocker8786 View Post

I think you are right, but I'm not sure how exactly to download the pathway data files. I thought the step you told me to change (below) would download the package I need, but it fails:

gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.ssc.eg.db
Bioconductor version 2.13 (BiocInstaller 1.12.0), ?biocLite for help
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.13 (BiocInstaller 1.12.0), R version 3.0.3.
Installing package(s) 'org.Ss.eg.db'
Loading required package: org.Ss.eg.db
Error in pathview::id2eg(gnames, category = "symbol", org = "Ss") :
Fail to install/load gene annotation package org.Ss.eg.db!
In addition: Warning messages:
1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘org.Ss.eg.db’
2: package ‘org.ssc.eg.db’ is not available (for R version 3.0.3)
3: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘org.Ss.eg.db’

I changed org= "Ss" to "Ssc" and "ssc", but the result is the same, there does not seem to be a database with this name. However on the KEGG website, pig is listed as ssc. Is this the proper way to download the database, or is there another step I am missing? Thanks.
Leave a comment:
bigmw replied

03-14-2014, 05:53 PM
The files you downloaded are named sscNA.xml hence your path.ids2 are all NA’s. In other words, no real pathways were selected in your gage analysis step. You gage analysis had this problem because you used kegg.gs which are human gene set data not the pig data.
What you did:
data(kegg.gs)
fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)

You should generated pig gene set data first using kegg.gsets function in gage package:
kg. ssc=kegg.gsets("ssc")
kegg.gs=kg. ssc$kg.sets[kg. ssc$sigmet.idx]
fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)

And this is the only problem in your original analysis session I quoted below. It will work once you get this right.
Please always pay attention to the species matching of your own data, gene set or pathway data as documented in gage and pathview packages. Actually you will find everything in the pacakge tutorials and documentations for functions you work with like gage or pathview etc:

gage

http://bioconductor.org/packages/release/bioc/html/gage.html

GAGE is a published method for gene set (enrichment or GSEA) or pathway analysis. GAGE is generally applicable independent of microarray or RNA-Seq data attributes including sample sizes, experimental designs, assay platforms, and other types of heterogeneity, and consistently achieves superior performance over other frequently used methods. In gage package, we provide functions for basic GAGE analysis, result processing and presentation. We have also built pipeline routines for of multiple GAGE analyses in a batch, comparison between parallel analyses, and combined analysis of heterogeneous data from different sources/studies. In addition, we provide demo microarray data and commonly used gene set data based on KEGG pathways and GO terms. These funtions and data are also useful for gene set analysis using other methods.

pathview

http://bioconductor.org/packages/release/bioc/html/pathview.html

Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

Originally posted by shocker8786 View Post

Thanks for your reply, and for informing me about the org="Ss" command. Unfortunately I am still unable to make it through the workflow without error. Below is my code and final output. Do you see what the issue could be? Thanks.

> kg.ssc=kegg.gsets("ssc")
> kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
> cuff.res=read.delim(file="gene_exp.diff", sep="\t")
> cuff.fc=cuff.res$log2.fold_change.
> gnames=cuff.res$gene
> sel=gnames!="-"
> gnames=as.character(gnames[sel])
> cuff.fc=cuff.fc[sel]
> names(cuff.fc)=gnames
> gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.Ss.eg.db

> sel2=gnames.eg[,2]>""
> cuff.fc=cuff.fc[sel2]
> names(cuff.fc)=gnames.eg[sel2,2]
> range(cuff.fc)
[1] -Inf Inf
> cuff.fc[cuff.fc>10]=10
> cuff.fc[cuff.fc< -10]=-10
> exp.fc=cuff.fc
> out.suffix="cuff"
> require(gage)
> data(kegg.gs)
> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[, "q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
> require(pathview)
> pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
+ gene.data = exp.fc, pathway.id = pid,
+ species = "ssc", out.suffix=out.suffix))
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Leave a comment:
shocker8786 replied

03-14-2014, 01:29 AM
I think you are right, but I'm not sure how exactly to download the pathway data files. I thought the step you told me to change (below) would download the package I need, but it fails:

gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.ssc.eg.db
Bioconductor version 2.13 (BiocInstaller 1.12.0), ?biocLite for help
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.13 (BiocInstaller 1.12.0), R version 3.0.3.
Installing package(s) 'org.Ss.eg.db'
Loading required package: org.Ss.eg.db
Error in pathview::id2eg(gnames, category = "symbol", org = "Ss") :
Fail to install/load gene annotation package org.Ss.eg.db!
In addition: Warning messages:
1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘org.Ss.eg.db’
2: package ‘org.ssc.eg.db’ is not available (for R version 3.0.3)
3: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘org.Ss.eg.db’

I changed org= "Ss" to "Ssc" and "ssc", but the result is the same, there does not seem to be a database with this name. However on the KEGG website, pig is listed as ssc. Is this the proper way to download the database, or is there another step I am missing? Thanks.
Leave a comment:
bigmw replied

03-13-2014, 12:38 PM
Looks like your downloading had some problem and you didn’t download any pathway data file. Can you confirm on this?
Leave a comment:
shocker8786 replied

03-13-2014, 01:26 AM
Thanks for your reply, and for informing me about the org="Ss" command. Unfortunately I am still unable to make it through the workflow without error. Below is my code and final output. Do you see what the issue could be? Thanks.

> kg.ssc=kegg.gsets("ssc")
> kegg.gs=kg.ssc$kg.sets[kg.ssc$sigmet.idx]
> cuff.res=read.delim(file="gene_exp.diff", sep="\t")
> cuff.fc=cuff.res$log2.fold_change.
> gnames=cuff.res$gene
> sel=gnames!="-"
> gnames=as.character(gnames[sel])
> cuff.fc=cuff.fc[sel]
> names(cuff.fc)=gnames
> gnames.eg=pathview::id2eg(gnames, category ="symbol", org="Ss")
Loading required package: org.Ss.eg.db

> sel2=gnames.eg[,2]>""
> cuff.fc=cuff.fc[sel2]
> names(cuff.fc)=gnames.eg[sel2,2]
> range(cuff.fc)
[1] -Inf Inf
> cuff.fc[cuff.fc>10]=10
> cuff.fc[cuff.fc< -10]=-10
> exp.fc=cuff.fc
> out.suffix="cuff"
> require(gage)
> data(kegg.gs)
> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[, "q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
> require(pathview)
> pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(
+ gene.data = exp.fc, pathway.id = pid,
+ species = "ssc", out.suffix=out.suffix))
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Start tag expected, '<' not found
Parsing ./sscNA.xml file failed, please check the file!
Leave a comment:
bigmw replied

03-12-2014, 05:36 PM
This is indeed a typo, it should be cuff.fc instead of exp.fc at this step. And you will assign cuff.fc to exp.fc in a later step, and then work exclusively on exp.fc:
range(cuff.fc)
#remove the -Inf and Inf values, which block the downstream analysis
cuff.fc[cuff.fc>10]=10
cuff.fc[cuff.fc< -10]=-10
exp.fc=cuff.fc
out.suffix="cuff"
…

BTW, the demo example uses human data, and you have to specify organism to be Sus scrofa (pig) for your data when you map gene sybmols to entrez gene IDs as below:
gnames.eg=pathview::id2eg(gnames, category ="symbol", org=="Ss")

Last edited by bigmw; 03-13-2014, 12:40 PM.
Leave a comment:
shocker8786 replied

03-12-2014, 07:32 AM
changing range(exp.fc) to range(cuff.fc) results in the following result printed by R:
[1] -Inf Inf

So is this a typo in the RNAseq workflow, or am I already supposed to have exp.fc defined? I'm trying to go through these pdfs, but I'm confused as to what this step is doing.
Leave a comment:
shriram replied

03-12-2014, 07:10 AM
you have to use cuff.fc instead of exp.fc I guess.
This is R issue as you didn't had this object
Leave a comment:
shocker8786 replied

03-12-2014, 07:02 AM
I have a question about using GAGE with data from cufflinks, as described in the RNA-Seq workflow tutorial. I have RNAseq data from pigs that was aligned using Tophat and analyzed for DEGs using cufflinks. I'm going through the process listed in the cufflinks section, but I'm running into an error. Below are the commands I've been entering. Everything runs fine until I get to the last command.

> cuff.res=read.delim(file="gene_exp.diff", sep="\t")
> cuff.fc=cuff.res$log2.fold_change
> gnames=cuff.res$gene
> sel=gnames!="-"
> gnames=as.character(gnames[sel])
> cuff.fc=cuff.fc[sel]
> names(cuff.fc)=gnames
> gnames.eg=pathview::id2eg(gnames, category ="symbol")
> sel2=gnames.eg[,2]>""
> cuff.fc=cuff.fc[sel2]
> names(cuff.fc)=gnames.eg[sel2,2]
> range(exp.fc)
Error: object 'exp.fc' not found

Do you know what the issue could be? I'm just starting out with RNAseq data and using R, and I haven't been able to find anyone else with this issue. Thanks.
Leave a comment:
bigmw replied

02-22-2014, 07:36 AM
gene.idtype="KEGG" specifies the ID type used for the gene.data. It is not related to the error message, which indicates a download problem. As shown in your solution, this download problem is due to the wrong pathway IDs.

Originally posted by shriram View Post

############
Issue resolved
by taking substring of actual pathway name in kegg and specifying gene.idtype="KEGG"
path.ids <- substr(path.ids, 1, 8)
############
Leave a comment:

Previous 1 2 3 4 5 6 7 template Next

Best Practices for Single-Cell Sequencing Analysis

by seqadmin

While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
- Channel: Articles
06-06-2024, 07:15 AM
Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, Yesterday, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, 06-06-2024, 08:18 AM	0 responses 20 views 0 likes	Last Post by seqadmin 06-06-2024, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, 06-06-2024, 08:04 AM	0 responses 18 views 0 likes	Last Post by seqadmin 06-06-2024, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News