Header Leaderboard Ad


Am I completely missing the point?



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Am I completely missing the point?

    So... Trying to get an overview of Limma, LimmaVoom, EdgeR, DESeq2, NPEBseq etc. I'm getting the feeling, that the task of differential gene expression analysis is being over-complicated...?

    I'm currently looking at a count matrix derived from 95 RNAseq samples from Illumina HiSeq2000 (Illumina TruSeq stranded kit). Raw reads mapped to hg19 using STAR and then counted using HTSeq.

    The result is a count matrix with 25369 rows and 95 columns, then I have two groups classic case(n=15)/control(n=80). I then perform the following steps:

    1. Use the edgeR package to perform TMM normalisation of the raw counts
    2. Foreach gene do a case vs. control t-test and a Wilcoxon test on the TMM values
    3. Apply FDR correction
    4. Sort on ascending FDR-value for the t-test and use the Wilcoxon p-value to get an idea of whether the difference is "outlier-driven"

    Please enlighten me as to why this simple approach is not sufficient?


  • #2
    It may be sufficient - after all, you have quite a few samples. With a small number of samples it can be hard to achieve the necessary statistical power without "borrowing variance across genes".

    Or you could use SAMSeq which is very simple to use and understand. It's based on non-parametrics stats.


    • #3
      to echo Kopi-o, these methods each give the motivation fairly early on in the corresponding paper:


      "Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small."


      "Typically, the number of replicates is small, and further modelling assumptions need to be made in order to obtain useful estimates."


      "Borrowing information between genes is a crucial feature of the genome-wide statistical methods, as it allows for gene-specific variation while still providing reliable inference with small sample sizes."

      I'd also recommend checking out SAMseq paper and method.


      • #4
        Run your statistical tests on log2 values. That's all I have to add. With those sample sizes you could even do permutation tests and avoid any distribution assumptions all together.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */