Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jamiou
    Junior Member
    • Jun 2015
    • 2

    Basic questions about fold change calculations

    Hello, SEQanswers! I'm a student of biotech and I'm currently doing my bachelor thesis somewhat related to bioinformatics, and a part of it is to look at changes in expression levels for genes from RNA-seq data. I know basically nothing of this field, but I'm trying to learn, so I'm sorry if my questions are way too basic and/or stupid. I feel that bioinformatics could possibly be something I want to do for a masters degree, so I really want to dive as deep into it as I can at this opportunity!

    First question, fold change. The way I understand it this is simply the abundance level (I read that RPKM is an abundance level, but I haven't read that much about it yet) for gene X in sample A divided by sample B, correct? But when I search for articles for fold change I mostly find various software for "differential expression" (which I assume is sort of the same as fold change?). I cannot find any articles that use fold change as sample A/sample B... Why is this? I assume that there is some reason that you can't or shouldn't do this calculation, but I don't understand what it is.

    Second question, (which I discovered when trying to find answers to the first question) is about genes with very low abundance. Regardless of how you calculate fold change, how do you account for genes that have a very low abundance level, i.e. close to the limit of detection? For example, if you have abundance levels of sample A and sample B that is (both) close to 0, but still yield some fold change you are interested in, can you really say that the gene has a different abundance level? I mean, if both abundance levels are so close to the limit of detection they could both possibly be false, right? How do you generally account for this kind of thing, or do I just misunderstand how RNA-seq detection limits work? I read that an RPKM of 1 is approximately equivalent to 1 RNA molecule per cell, so if you have RPKMs of (for example) 0.8 and 0.2 you will have a fold change of 4, but can you really trust that number?
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Most (probably all) packages that are used to find differentially expressed genes will return either a fold-change or a log2 fold-change (this is typically computed on the log2 scale). You would normally compute the fold-change between groups, rather than between samples (since who cares if two samples differ if the groups that they're part of don't).

    You're second question relates to the first. Firstly one computes a p-value and then sort the significant results by fold-change, since low abundance genes/transcripts will show randomly high fold-changes. Secondly, one can compute the fold-change by incorporating a prior distribution. This is done in DESeq2, for example, where lowly expressed genes will have their fold-changes shrunken toward 0.

    There is no fixed correspondence between RPKM and molecules per cell. In fact, you would be wise to not use RPKM for any statistics, use either raw or estimated counts instead.

    Comment

    • Jamiou
      Junior Member
      • Jun 2015
      • 2

      #3
      Oh, okay. But where do you get the p-value from? That is some sort of hypothesis test, right? So if fold change is gene x in group A / group B, how do I get a p-value from that? And is it possible to get significant p-values ever for genes expressed close to zero (in the groups)?

      Why is RPKM bad for statistics? I think I read that some software use RPKM (Cufflinks?). Why is raw (what do you mean by that?) or estimated counts better?

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Yes, the p-value is derived from a hypothesis test. Popular programs for this include DESeq2, edgeR, limma/voom, and cuffdiff. It's typically not possible to get significant results from very lowly expressed genes, since they tend to lack enough alignments to lend statistical power.

        The conversion to RPKM loses all precision information, which makes it difficult to use for statistics. You can google for more.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM
        • SEQadmin2
          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
          by SEQadmin2


          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


          Introduction

          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
          05-22-2026, 06:42 AM
        • SEQadmin2
          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
          by SEQadmin2

          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
          05-06-2026, 09:04 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        14 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 05-28-2026, 11:40 AM
        0 responses
        29 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 05-26-2026, 10:12 AM
        0 responses
        31 views
        0 reactions
        Last Post SEQadmin2  
        Working...