Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Beginner question for Differential Expression Analysis

    Hello,

    I am a beginner in analyzing data from an RNA seq experiment. I was not the one performing the bioinformatics analysis (I am more of a bench scientist). So, I have an excel file in my hands. I am a bit confused though with how to retrieve my DE genes.
    I have read what p and q values represent. I have understood that setting an FDR value threshold is a 'safe' choice in order to identify whether the significant differences recorded are truly significant.

    I am a bit confused though with choosing the FDR threshold. If I understand correctly the level of 0.05 does not apply to all experiments.

    Could you please refer me to some further reading, or perhaps provide me with some tips, so that I proceed correctly with my analysis?

    I apologize if this is a very basic question. I appreciate your help.

    Regards
    Vassen

  • #2
    The raw p-values in your results are still what they are - at a per-gene level given the dispersion models of the expression values in conditions that gene has a low probability of NOT being deferentially expressed. Statistical reality, however, shows us that when we repeatedly run a statistical test between two groups of values that DO come from the same distribution (say split 20 values with a mean of 10 and stdev of 5 into two random groups) we will see 5% or so of those tests return a significant p-value. So given the large number of genes we are testing people theorize that there's a measurable effect of type I error.

    In practice I think of the p-value and q-value (adjusted p-value, FDR, etc) differently in different situations. If our goal is a candidate type approach, which means we'll be running additional experiments to verify the RNA-seq result for that gene, we may use the raw p-values to get a broader list of candidates. If we have a phenotype and we want to report the number of genes affected or the percentage of genes enriched vs depleted we'll use the adjusted p-values since that is a more general claim.

    Sometimes our experiment may yield zero significant genes by the adjusted p-values even though we know there's a phenotype. In those cases we may proceed with genes significant by raw p-value and keep in mind that we must proceed cautiously. We wouldn't do that if we were going straight into a figure with that result - we'd of course try to confirm if any of those genes appear to be different via other methods.

    Finally, keep in mind that raw p-values likely have a high type-I error rate while the adjusted p-values likely have a high type-II error rate. Both of these rates improve the larger your sample size. Of course with higher and higher sample sizes you'll also get significance calls for features with smaller and smaller effect sizes and you'll have to start thinking in terms of "what is a significant effect?". I can't answer that one.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Many thanks sdriscoll!!

      Cheers
      Vassen

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      66 views
      0 likes
      Last Post seqadmin  
      Working...
      X