Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FPKMs and Limma R package

    Hi!

    I have generated a dataset with 9 different biological samples (plus replicates) and have analyzed it using TopHat and CuffLinks. Therefore, I currently have a table with the FPKM values for every gene in each sample.

    I am trying to use the Limma R package to model and extract differentially expressed genes between these several different samples (instead of 2-by-2 comparisons that can be made using CuffDiff) and have encountered the following problem to which I would really appreciate some advice.

    I have to transform the FPKM values into log2 values to then use this in the lmFit() function. However, since there are "zeros", if I do this directly on the FPKM table, a lot of "Infinite" values are generated. I was therefore thinking of adding a specific number to all of the FPKM values before transforming them into log2 data. So my questions are:

    1. Is this a good approach?
    Are there better alternatives?

    2. Is there a specific value that should be added?
    I was thinking of adding a small value (e.g. 10^-10, a value whose log2(10^-10) ~-33 is in the "opposite" range of the log2 positive values - in my table the maximum log2(FPKM)~22).
    But I am not sure if this is correct and would also like to know if there is a "normal" value that people usually add.

    Thanks!!!

    Note: I also have the count numbers and could eventually do everything with the voom function and then Limma, but since I have all my initial analysis using the FPKMs I would really like to stick with them for consistency... so any help is deeply appreciated!

  • #2
    Adding a small count seems to be the common method. If you look at how edgeR calculates log2(rpkm), for example, you'll see that it adds a small value (0.25 by default) to the raw counts before computing CPM, which is then used to get RPKM. For comparison, a minimum of 0.25 on the raw count scale would be ~2.5e-7 FPKM for a 1kb gene (depending on how library sizes were computed).

    Comment


    • #3
      Thanks!
      I have tried this but I am not happy with the results... I get really strange volcano plots (see figure), which I guess are a consequence of different variance stabilization methods...
      Therefore, I think I will stick with the use of the read counts (even if it means going back and re-doing my previous analysis).

      file:///Users/elsaabranches/Desktop/volcanos/volcano_plots.jpg

      Comment


      • #4
        Yes, that's a wise decision. Use the voom function to process the counts prior to lmFit. The voom-limma pipeline needs to work with counts, rather than with FPKM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-07-2024, 06:58 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:18 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-06-2024, 08:04 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-03-2024, 06:55 AM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Working...
        X