  • sindrle
    Another question... How to alter the heatmaps text-part size (KEGG pathway names)?
    The names are all capped, its unreadable.. The "pdf.size" - option only regualtes the graphic part.

    ### significant.genesets
    kegg.sig<-sigGeneSet(cnts.kegg.p, outname="~/RNAseq/13_Acute-Changes/14_GAGE_native_A1A2/A1A2All/A1A2All.kegg",pdf.size = c(7,12))

  • shriram
    pv.out.list <- sapply(enriched_pathways, function(pid) pathview( = gene_fc, = pid, species = "sce", gene.idtype="KEGG", same.layer = F, kegg.native = T, node.sum="median"))

    Data in pathview:
    GLK1 -0.35 0.000 0.620 -1.118 -0.900

    # original data supplied for pathview
    GLK 0.14 -1.6 0.62 -1.1 -0.37

    I have attached the resultant pathway image.
    Original image sce00051.png shows genes specific[in green] to yeast.

    I am wondering why pathview data differ for GLK when GLK is the only gene on that node.

  • sindrle
    Pasted wrong sessioninfo..

    # > sessionInfo()
    # R version 3.0.3 (2014-03-06)
    # Platform: x86_64-apple-darwin10.8.0 (64-bit)

    # locale:
    # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    # attached base packages:
    # [1] grid parallel stats graphics grDevices utils datasets
    # [8] methods base

    # other attached packages:
    # [1] Rgraphviz_2.6.0
    # [2] gageData_2.0.3
    # [3] pathview_1.2.4
    # [4]
    # [5] RSQLite_0.11.4
    # [6] DBI_0.2-7
    # [7] KEGGgraph_1.20.0
    # [8] graph_1.40.1
    # [9] XML_3.95-0.2
    # [10] gage_2.12.3
    # [11] Rsamtools_1.14.3
    # [12] Biostrings_2.30.1
    # [13] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
    # [14] GenomicFeatures_1.14.5
    # [15] AnnotationDbi_1.24.0
    # [16] Biobase_2.22.0
    # [17] GenomicRanges_1.14.4
    # [18] XVector_0.2.0
    # [19] IRanges_1.20.7
    # [20] BiocGenerics_0.8.0

    # loaded via a namespace (and not attached):
    # [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
    # [4] digest_0.6.4 httr_0.2 KEGGREST_1.2.0
    # [7] png_0.1-7 RCurl_1.95-4.1 rtracklayer_1.22.6
    # [10] stats4_3.0.3 stringr_0.6.2 tools_3.0.3
    # [13] zlibbioc_1.8.0

    Pathview works, but I dont get colors or up/down regulated genes...

  • bigmw
    You don’t even have pathview package loaded based on your sessionInfo().

    Originally posted by sindrle View Post
    Im having problems with Pathview. I can only get native KEGG, the kegg.native=F does not work.

    Also the native KEGG only has green color, not red (up regulated) and green (down regulated).

    Why am I having these two problems?

    # > sessionInfo()
    # R version 3.0.3 (2013-09-25)
    # Platform: x86_64-apple-darwin10.8.0 (64-bit)

    # locale:
    # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    # attached base packages:
    # [1] parallel stats graphics grDevices utils datasets methods
    # [8] base

    # other attached packages:
    # [1] Rsamtools_1.14.3
    # [2] Biostrings_2.30.1
    # [3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
    # [4] GenomicFeatures_1.14.5
    # [5] AnnotationDbi_1.24.0
    # [6] Biobase_2.22.0
    # [7] GenomicRanges_1.14.4
    # [8] XVector_0.2.0
    # [9] IRanges_1.20.7
    # [10] BiocGenerics_0.8.0
    # [11] BiocInstaller_1.12.0

    # loaded via a namespace (and not attached):
    # [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
    # [4] DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4
    # [7] rtracklayer_1.22.5 stats4_3.0.2 tools_3.0.2
    # [10] XML_3.95-0.2 zlibbioc_1.8.0

  • bigmw
    May I know what node, gene, pathway and what species you are talking about?

    Originally posted by shriram View Post
    Thanks for the quick reply.
    In above example GeneA is the only gene [shown green in original kegg] on that node for that species as other genes on the node are not present in the given species.

  • sindrle
    Im having problems with Pathview. I can only get native KEGG, the kegg.native=F does not work.

    Also the native KEGG only has green color, not red (up regulated) and green (down regulated).

    Why am I having these two problems?

    Native KEGG
    # pv.out.list <- sapply(path.ids2, function(pid) pathview( = d[,
    # 1:2], = pid, species = "hsa", kegg.dir = "~/RNAseq/13_Acute-Changes/13_GAGE_native_A1A2/A1A2pT2D/Pathview"))

    Graphviz view
    # pv.out.list <- sapply(path.ids2, function(pid) pathview( = d[,
    # 1:2], = pid, species = "hsa", kegg.native=F,
    # sign.pos="bottomleft", kegg.dir = "~/RNAseq/13_Acute-Changes/13_GAGE_native_A1A2/A1A2pT2D/Pathview"))

    # > sessionInfo()
    # R version 3.0.3 (2013-09-25)
    # Platform: x86_64-apple-darwin10.8.0 (64-bit)

    # locale:
    # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    # attached base packages:
    # [1] parallel stats graphics grDevices utils datasets methods
    # [8] base

    # other attached packages:
    # [1] Rsamtools_1.14.3
    # [2] Biostrings_2.30.1
    # [3] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
    # [4] GenomicFeatures_1.14.5
    # [5] AnnotationDbi_1.24.0
    # [6] Biobase_2.22.0
    # [7] GenomicRanges_1.14.4
    # [8] XVector_0.2.0
    # [9] IRanges_1.20.7
    # [10] BiocGenerics_0.8.0
    # [11] BiocInstaller_1.12.0

    # loaded via a namespace (and not attached):
    # [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
    # [4] DBI_0.2-7 RCurl_1.95-4.1 RSQLite_0.11.4
    # [7] rtracklayer_1.22.5 stats4_3.0.2 tools_3.0.2
    # [10] XML_3.95-0.2 zlibbioc_1.8.0

  • shriram
    Thanks for the quick reply.
    In above example GeneA is the only gene [shown green in original kegg] on that node for that species as other genes on the node are not present in the given species.

  • bigmw
    Again, that’s because in this pathway, node labelled “Gene A” includes gene(s) other than “Gene A” but with similar function. It is a summary of fold changes for all genes mapped to this node, should you be surprised that it is different from fold change of “Gene A” alone?
    You may use node.sum argument to control the the node summary is calculated, mean, median, max etc.

  • shriram
    Fold change ploting

    Thank you for your reply, now I understand why there are differences at node level data and actual fold changes.
    However when I want to compare the fold change that I see in the input data and on the KEGG pathway it doesn't correlate well.
    Gene A:
    T1 T2 T3 T4 T5
    0.66 4.1079 0.830 3.278 2.71

    input fold change values:
    Gene A
    -1.2 0.7 -0.14 -0.48 -0.78

    Color coding for this gene on pathway node:

    #EF3030 #FF0000 #FF0000 #FF0000 #FF0000

    which doesn't reflect the trend seen in the input data.

    When I change node.sum="median" it little bit shows same trend as input fold changes.
    -1.15 0.7626 0.000 1.387 1.534
    #00FF00 #EF3030 #BEBEBE #FF0000 #FF0000

    It is bit confusing for me.


  • bigmw
    If you are look for the gene entry"120" underpv.out.list[[1]]$, it may or may not be different from the original data, cnts.d, because multiple genes may be mapped to the same nodes in a KEGG pathway. What you found in $ is the node level summary instead of single gene data. Each node there is labeled or named after the most representative member gene.

    Please check the pathview tutorial (page 7) and documentations for more details:
    Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

  • shriram
    I was looking at the expression data from pathview object and original data supplied to pathview, I found discrepancies in the two values:

    Original count data (human data from reference manual):
    > cnts.d["120", ]
    ERR127302 ERR127303 ERR127304 ERR127305
    0.277 0.577 0.441 0.021

    output from Pathview object for human data from reference manual
    ERR127302 ERR127303 ERR127304 ERR127305
    0.3952 0.8842 1.628 1.1983

    Does pathview transforms the original data before plotting on the kegg pathways?


  • sindrle
    Ok, thanks!
    I was thinking about the same, but I figured since I have already ran HTseq I just used the results from there.

    Good to know until next time.

  • bigmw
    I couldn’t find your original problem, but if I remember correctly, your summarizeOverlaps step didn’t work. And you did something like:

    flag <- scanBamFlag(isNotPrimaryRead=FALSE, isProperPair=TRUE)
    param <- ScanBamParam(flag=flag)
    gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=TRUE, param=param)

    I guess you have single end data, so try:
    flag <- scanBamFlag(isNotPrimaryRead=FALSE, isProperPair=NA)
    The following flag line is for paired end data:
    flag <- scanBamFlag(isNotPrimaryRead=FALSE, isProperPair=TRUE)

    Check help info for details:

  • sindrle
    Im a retard. Forgot to convert to gene symbols:

    data(egSymb)<-lapply(, eg2sym)
    Last edited by sindrle; 03-18-2014, 05:51 AM.

  • shocker8786
    Thank you very much for that explanation, this is starting to make more sense to me now. I will make the necessary changes and retry. Thanks again!

