Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sazz
    Member
    • Oct 2012
    • 28

    GSEA for RNA-seq data

    Hello,

    We have performed an RNA-seq analysis with Tuxedo Tools. There are 2 conditions and 3 replicates for each, and we have detected differentially expressed genes with 0.01 q cut-off.

    Now, it is time to get meaningful results by comparing with certain gene sets. I have used DAVID at the first step, then now, I am trying to use GSEA, but I have some questions in my mind.

    - Should I load all the replicate FPKM values for each condition as input?
    - Should I load all the genes without cut off?

    Note: They say, you can use "Fpkm_trackingToGct" program from Genepattern. In this case you load the fpkm_tracking file of cuffdiff output and it gives you an output in .gct format which is going to be used as input in GSEA. In this case, that output includes the overall calculated value for 3 replicates (not all of them seperately). If this is the case, I assume I should use a cut-off when loading the genes because the program won't be able to calculate the variation btw replicates, but I also don't know if this leads to some bias.
  • yzzhang
    Member
    • Jan 2013
    • 67

    #2
    Hi, sazz, have solve your problem? I also wonder these things. If you get good solution, could you kindly share it? Thanks in advance.

    Comment

    • sazz
      Member
      • Oct 2012
      • 28

      #3
      Hi yzzhang,

      I wrote to gsea developers, that's the answer:

      "The best approach would be to create a GCT file where rows are gene identifiers
      (ideally, unique instances of human gene symbols), columns are the biological
      replicates for each of two phenotypes and the values are FPKM values.

      In any case, you should avoid filtering your data in any way because this would
      significantly reduce power of GSEA."

      But apart from this problem, I don't trust the ranking methods, ttest or signal2noise are not that much suitable for that kind of analysis. Even the formulas of those methods cares about the variation between replicates, it does not fit to the logic of CuffDiff significancy calculations. You can check the ranked list at the end, and you will see that the ones with very low expressions but also with a low variancy, are not in the "significant list" in CuffDiff but in high ranks at GSEA output, probably because of their low expressions; so a formula that considers the q-value of CuffDiff and log2fold change would be best; so you can do a pre-ranking; but I haven't found something like this yet; and also my statistics is not that good and I can't figure it out on my own.

      Actually in the paper of "Differential analysis of gene regulation at transcript resolution with RNA-seq" of Cole Trapnell, they use GSEA after RNA-seq and in the methods part, they say:

      "Enrichment for up- or downregulation sets of genes from the REACTOME pathway database was computed by running GSEA against the fold-change ranked list of genes in the experiment. Ranking was based on Cuffdiff 2–derived fold change."

      So they say, they used fold change at the CuffDiff result but this can't be that simple, just disregarding the q-value. I asked to Cole Trapnell by mail but he didn't respond.

      Comment

      • yzzhang
        Member
        • Jan 2013
        • 67

        #4
        Hi, sazz,
        Thanks a lot. I appreciate your help, and I will think if this method is suitable for my data. Thanks again.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM
        • SEQadmin2
          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
          by SEQadmin2


          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


          Introduction

          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
          05-22-2026, 06:42 AM
        • SEQadmin2
          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
          by SEQadmin2

          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
          05-06-2026, 09:04 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        14 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 05-28-2026, 11:40 AM
        0 responses
        29 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 05-26-2026, 10:12 AM
        0 responses
        31 views
        0 reactions
        Last Post SEQadmin2  
        Working...