> lapply(kegg.gs[1:3], head, 3)
$`ssc00970 Aminoacyl-tRNA biosynthesis`
[1] "100144457" "100153423" "100155115"
…
The problem resides in yoru expression data. In your problematic dataset (let’s call it dataset 1), you have only 7434 entries (not exactly genes):
> str(cuff.fc)
Named num [1:7434] 0.6134 0 0.3767 0.0177 -2.1022 ...
…
In your good dataset 2, you have 344025 entries:
> str(cuff.fc)
Named num [1:344025] 1.279 0 0.155 0.806 -0.958 ...
…
Pig has 57700 genes in total based on NCBI data. Assume your dataset 2 has a full or decent coverage on pig genes, each pig gene maps to about 6 entries in this dataset. Assume the same level of redundancy, there are only ~ 1200 unique genes in your dataset 1. In other words, this is likely a sublist of selected or significant genes. Pathway or gene set analysis like GAGE requires a comprehensive and balance coverage of genes for your research speices, e.g. the full list of genes measured in a microarray or RNA-seq study, usually in the order of several thousand or tens of thousands of genes.
Therefore, very likely you gage analysis of dataset 1 doesn’t yield any significant pathways due to the biased preslected/short gene list with only ~ 1200 genes. Hence you get NA pathway IDs when you select significant pathways. Your dataset 2 does give you some sensible results because you seem to have a good and balanced coverage of pig genes there.
However, you should really make sure each gene ID has only 1 expression value when doing gene set or pathway analysis because genes are treated as independent variables. In the GAGE/Pathview native workflow or joint workflow with DESeq/DESeq2/edgeR/limma, we explicitly mapped reads to Entrez Genes and summarized expression counts per genes. But your data were mapped and processed by Cufflinks likely to some transcript sequences like RefSeq mRNA’s (with Entrez Gene annotation). You should summarize the expression data for multiple transcripts of the same gene into 1 unified expression level, and then conduct pathway analysis. We describe how to do so in a secondary gage tutorial, “Gene set and data preparation”, please check Section 5-gene or transcript ID conversion:
Originally posted by shocker8786
View Post
Leave a comment: