Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advanced interaction contrast in DESeq2

    Hi everyone,



    I am currently working on some RNA-seq data in DESeq2 and I would like to know if I can transform a specific biological question into a DESeq2 glm contrast for which Differentially Expressed Genes (DEGs) can be called. I am not very comfortable with contrasts (and not even sure they’re the way to go, despite having read the helpful related Seqanswers threads) so I’m looking for some help!



    We have an experiment with 3 immune cell Populations (say A, B, and C, with A as the reference) and 2 Conditions (a Control condition as reference, and a Stimulated condition), with 3 biological replicates per group (hence 18 samples in total).

    So far we’ve analysed each cell population independently, deriving the list of DEGs between Stimulated and Control in each, and then looking at how these lists intersect (e.g. with Venn diagrams).



    However, it would make much more sense to analyse all samples in one go, both statistically (more power) and biologically (as everything was done with the same protocol). In this case cell Population would be added as a new factor, and I’d also include the Population:Condition interaction. I have been experimenting with this setup:
    Code:
    design = ~ Population + Condition + Population:Condition
    The list of resultsNames in the model is thus:
    Code:
    "Intercept"                  
    "ConditionControl"                   "ConditionStimulated"               
    "PopulationA"                          "PopulationB"           "PopulationC"
    "ConditionControl.PopulationA" "ConditionStimulated.PopulationA"
    "ConditionControl.PopulationB" "ConditionStimulated.PopulationB" 
    "ConditionControl.PopulationC" "ConditionStimulated.PopulationC"
    Getting the list of overall DEGs for Stimulated versus Control is straightforward:
    Code:
    contrast=c(“Condition”, “Stimulated”, “Control”)
    Writing a contrast to compare Population C versus B, irrespective of Condition, is easy too:
    Code:
    contrast=c(“Population”, “C”, “B”)
    I’m not entirely sure of myself but it seems to me that the contrast to test the effect of the interaction of Population B and Condition Stimulated would be:
    Code:
    contrast=list(c(“PopulationB”, “ConditionStimulated”, “PopulationB.ConditionStimulated”))
    Am I right so far?



    Either way, then comes the tricky part.

    Our biological question is: what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C? (this is because we know A and B drive a specific type of downstream response, while C is more of an “on-looker”, a sort of internal control if you wish)

    Is this the same as asking for the glm coefficients of PopulationB.Stimulated and PopulationC.Stimulated? If yes, how does one write this up as a contrast? I haven’t been able to figure it out, despite reading the various posts about writing up the contrasts... Can it be so simple as any of the following:
    Code:
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”))
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”, “ConditionStimulated”))
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”, “ConditionStimulated”, “PopulationB”, “PopulationC”))


    ...Or would it just be better to test for the overall Stimulated vs Control DEGs, and then the DEGs for each interaction with Population, and do set analysis on all that?



    Thanks in advance for your help!

    Best,

    -- Alex

  • #2
    hi Alex,

    In the latest DESeq2 version, 1.8, I have some new instructions for users who might have a difficult time setting up such contrasts.

    Can you look over the first code chunk in Section 3.3 Interactions:

    The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


    This makes comparisons of condition effects in each group much easier to perform.

    For your other question:

    "what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C?"

    There is not a way to build such a set with one results() call. But you can build each table individually and then look at the union() of the first two, setdiff() the last table.

    The last table can be constructed by specifying a threshold for LFC, say log2 fold change < 2, and then using lfcThreshold=2 and altHypothesis="lessAbs". You need to come up with a threshold value which makes sense for your system, and you have to turn off betaPrior for this comparison. I'd recommend inspecting how this changes with plotMA(res). And see the DESeq2 paper for more details on the meaning of this threshold test.
    Last edited by Michael Love; 04-28-2015, 11:12 AM.

    Comment


    • #3
      Hi Michael,


      Thank you very much for your quick reply!!!


      Originally posted by Michael Love View Post
      In the latest DESeq2 version, 1.8, I have some new instructions for users who might have a difficult time setting up such contrasts.

      Can you look over the first code chunk in Section 3.3 Interactions:

      The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


      This makes comparisons of condition effects in each group much easier to perform.
      => This documentation is exactly what I needed! It makes things so much clearer... I'll upgrade our version of R, Bioconductor and DESeq2 just to be sure and I'll try this out. Thank you!



      Originally posted by Michael Love View Post
      For your other question:

      "what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C?"

      There is not a way to build such a set with one results() call. But you can build each table individually and then look at the union() of the first two, setdiff() the last table.

      The last table can be constructed by specifying a threshold for LFC, say log2 fold change < 2, and then using lfcThreshold=2 and altHypothesis="lessAbs". You need to come up with a threshold value which makes sense for your system, and you have to turn off betaPrior for this comparison. I'd recommend inspecting how this changes with plotMA(res). And see the DESeq2 paper for more details on the meaning of this threshold test.
      => This is essentially what I had in mind, but doing it within DESeq2 rather than downstream is certainly much better from a statistical point of view. I'll try this as soon as possible!


      Thank you again for your help! I may have further questions later, we'll see...

      Best regards,

      -- Alex

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Genetic Variation in Immunogenetics and Antibody Diversity
        by seqadmin



        The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
        11-06-2024, 07:24 PM
      • seqadmin
        Choosing Between NGS and qPCR
        by seqadmin



        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
        10-18-2024, 07:11 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:09 AM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Today, 06:13 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 11-01-2024, 06:09 AM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-30-2024, 05:31 AM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Working...
      X