Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • EA01
    Junior Member
    • Feb 2013
    • 5

    FPKMs and Limma R package

    Hi!

    I have generated a dataset with 9 different biological samples (plus replicates) and have analyzed it using TopHat and CuffLinks. Therefore, I currently have a table with the FPKM values for every gene in each sample.

    I am trying to use the Limma R package to model and extract differentially expressed genes between these several different samples (instead of 2-by-2 comparisons that can be made using CuffDiff) and have encountered the following problem to which I would really appreciate some advice.

    I have to transform the FPKM values into log2 values to then use this in the lmFit() function. However, since there are "zeros", if I do this directly on the FPKM table, a lot of "Infinite" values are generated. I was therefore thinking of adding a specific number to all of the FPKM values before transforming them into log2 data. So my questions are:

    1. Is this a good approach?
    Are there better alternatives?

    2. Is there a specific value that should be added?
    I was thinking of adding a small value (e.g. 10^-10, a value whose log2(10^-10) ~-33 is in the "opposite" range of the log2 positive values - in my table the maximum log2(FPKM)~22).
    But I am not sure if this is correct and would also like to know if there is a "normal" value that people usually add.

    Thanks!!!

    Note: I also have the count numbers and could eventually do everything with the voom function and then Limma, but since I have all my initial analysis using the FPKMs I would really like to stick with them for consistency... so any help is deeply appreciated!
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Adding a small count seems to be the common method. If you look at how edgeR calculates log2(rpkm), for example, you'll see that it adds a small value (0.25 by default) to the raw counts before computing CPM, which is then used to get RPKM. For comparison, a minimum of 0.25 on the raw count scale would be ~2.5e-7 FPKM for a 1kb gene (depending on how library sizes were computed).

    Comment

    • EA01
      Junior Member
      • Feb 2013
      • 5

      #3
      Thanks!
      I have tried this but I am not happy with the results... I get really strange volcano plots (see figure), which I guess are a consequence of different variance stabilization methods...
      Therefore, I think I will stick with the use of the read counts (even if it means going back and re-doing my previous analysis).

      file:///Users/elsaabranches/Desktop/volcanos/volcano_plots.jpg

      Comment

      • Gordon Smyth
        Member
        • Apr 2011
        • 91

        #4
        Yes, that's a wise decision. Use the voom function to process the counts prior to lmFit. The voom-limma pipeline needs to work with counts, rather than with FPKM.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          Yesterday, 10:05 AM
        • SEQadmin2
          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
          by SEQadmin2


          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


          Introduction

          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
          05-22-2026, 06:42 AM
        • SEQadmin2
          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
          by SEQadmin2

          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
          05-06-2026, 09:04 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 12:03 PM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, Yesterday, 11:40 AM
        0 responses
        14 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 05-28-2026, 11:40 AM
        0 responses
        29 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 05-26-2026, 10:12 AM
        0 responses
        31 views
        0 reactions
        Last Post SEQadmin2  
        Working...