Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • LauGP
    Junior Member
    • Nov 2014
    • 7

    Statistical tests for differential gene expression in RNA-Seq

    Dear all,

    I´m a beginner in the RNA-Seq world who recently got some results to analyse and process. The data was analized by two pipelines in parallel: Tophat/Bowtie-->HTSeq count-->DESeq2 and in the CLC Genomics Workbench. So now I have 3 different outcomes from 3 statistical approaches, the one from DESeq2, EDGE and Baggerley´s test from CLC Genomics. Then I tried to find coherences among them, so I filtered the adjusted p-values (with the same threshold) from each test and compare the filtered genes lists to see how similar they are.
    What I got seems not very consistent to me. From DESeq2 there are around 1500 differential expressed genes, while from EDGE there are around 2000 and finally from Baggerley I got around 3000. I have read that the data for DESeq2 and EDGE should follow a Negative Binomial distribution while the data for Baggerley´s should follow a Beta-Binomial.

    Any clue about why I got so much difference in significantly differential expressed genes among those 3 statistical approaches? Which one should I use?

    Thanks a lot in advanced
    regards
  • sdriscoll
    I like code
    • Sep 2009
    • 436

    #2
    I'm not sure there is even an answer for that other than maybe to run several tests, as you have, and take the intersection of the genes or only take genes that show up in the majority of the tests (so if gene X is DE in 2 of the 3 tools keep that one). Keep in mind that these tools like to try to avoid reporting false-positives however their false-negative rates can be pretty bad. The following paper is relevant:



    Figures 3 and 4 are kind of telling in that even with 12 replicates per condition their true positive rate is still less than 50%.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    • mbblack
      Senior Member
      • Aug 2009
      • 245

      #3
      Did you select differentially expressed genes solely by a statistical threshold? What if you simultaneously add a fold change threshold as well - do you get more consistent lists? You can look at the rank order correlation of fold change to see how well it behaves across the different analyses.

      Look at some MA plots from each analyses and see if one or the other shows some skew that might indicate a normalization bias.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment

      • LauGP
        Junior Member
        • Nov 2014
        • 7

        #4
        Thanks for your answers and paper suggestion.

        Actually, I did both, first I selected those genes with adjusted p-value below 0.05 and afterwards sorted them looking for those upregulated and downregulated. The comparison among the top ranked ones, in each statistical approach, resulted in 33% of genes being significantly differential expressed in just one test, 22% in 2 tests and 5% in the 3 statistical approaches. That seemed quite inconsistent to me.

        There is some chance that in CLC Genomics Workbench the assembly of the sequences was done against Rnor_5.0 instead of the UCSC rn4. Do you think this could introduce such a big inconsistency in the filtered/sorted gene lists, among the 3 different statistical approaches?

        I will also look in more detail the MA plots

        kind regards

        Comment

        • LauGP
          Junior Member
          • Nov 2014
          • 7

          #5
          Hello,

          Just for the record, I confirmed the assembly of the sequences was done against Rnor 5_0 in one dataset and UCSC rn4 for the other analyses. Therefore it is quite probable the majority of the inconsistencies I´m having it´s due to the different reference genomes assemblies. Then I will just focus on one genome assembly for all the statistical analyses.
          In respect of the QC, in my opinion the MA plots of our dataset doesn´t shows normalization biases (see attached figure). On the other hand the PCA plots (see attached figure) shows a separation between replicates in 4 of the experimental groups (red, green, pink and dark blue), while in the other 2 groups it seems quite acceptable for me. Moreover there is an evident separation between red, green and pink groups, in respect of the dark blue, light blue and light green ones. All of these is in line with the observed differentially expressed genes.
          Attached Files

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          30 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          96 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          116 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          109 views
          0 reactions
          Last Post SEQadmin2  
          Working...