Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advanced interaction contrast in DESeq2

    Hi everyone,



    I am currently working on some RNA-seq data in DESeq2 and I would like to know if I can transform a specific biological question into a DESeq2 glm contrast for which Differentially Expressed Genes (DEGs) can be called. I am not very comfortable with contrasts (and not even sure they’re the way to go, despite having read the helpful related Seqanswers threads) so I’m looking for some help!



    We have an experiment with 3 immune cell Populations (say A, B, and C, with A as the reference) and 2 Conditions (a Control condition as reference, and a Stimulated condition), with 3 biological replicates per group (hence 18 samples in total).

    So far we’ve analysed each cell population independently, deriving the list of DEGs between Stimulated and Control in each, and then looking at how these lists intersect (e.g. with Venn diagrams).



    However, it would make much more sense to analyse all samples in one go, both statistically (more power) and biologically (as everything was done with the same protocol). In this case cell Population would be added as a new factor, and I’d also include the Population:Condition interaction. I have been experimenting with this setup:
    Code:
    design = ~ Population + Condition + Population:Condition
    The list of resultsNames in the model is thus:
    Code:
    "Intercept"                  
    "ConditionControl"                   "ConditionStimulated"               
    "PopulationA"                          "PopulationB"           "PopulationC"
    "ConditionControl.PopulationA" "ConditionStimulated.PopulationA"
    "ConditionControl.PopulationB" "ConditionStimulated.PopulationB" 
    "ConditionControl.PopulationC" "ConditionStimulated.PopulationC"
    Getting the list of overall DEGs for Stimulated versus Control is straightforward:
    Code:
    contrast=c(“Condition”, “Stimulated”, “Control”)
    Writing a contrast to compare Population C versus B, irrespective of Condition, is easy too:
    Code:
    contrast=c(“Population”, “C”, “B”)
    I’m not entirely sure of myself but it seems to me that the contrast to test the effect of the interaction of Population B and Condition Stimulated would be:
    Code:
    contrast=list(c(“PopulationB”, “ConditionStimulated”, “PopulationB.ConditionStimulated”))
    Am I right so far?



    Either way, then comes the tricky part.

    Our biological question is: what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C? (this is because we know A and B drive a specific type of downstream response, while C is more of an “on-looker”, a sort of internal control if you wish)

    Is this the same as asking for the glm coefficients of PopulationB.Stimulated and PopulationC.Stimulated? If yes, how does one write this up as a contrast? I haven’t been able to figure it out, despite reading the various posts about writing up the contrasts... Can it be so simple as any of the following:
    Code:
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”))
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”, “ConditionStimulated”))
    contrast=list(c(“PopulationB.Stimulated”, “PopulationC.Stimulated”, “ConditionStimulated”, “PopulationB”, “PopulationC”))


    ...Or would it just be better to test for the overall Stimulated vs Control DEGs, and then the DEGs for each interaction with Population, and do set analysis on all that?



    Thanks in advance for your help!

    Best,

    -- Alex

  • #2
    hi Alex,

    In the latest DESeq2 version, 1.8, I have some new instructions for users who might have a difficult time setting up such contrasts.

    Can you look over the first code chunk in Section 3.3 Interactions:

    The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


    This makes comparisons of condition effects in each group much easier to perform.

    For your other question:

    "what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C?"

    There is not a way to build such a set with one results() call. But you can build each table individually and then look at the union() of the first two, setdiff() the last table.

    The last table can be constructed by specifying a threshold for LFC, say log2 fold change < 2, and then using lfcThreshold=2 and altHypothesis="lessAbs". You need to come up with a threshold value which makes sense for your system, and you have to turn off betaPrior for this comparison. I'd recommend inspecting how this changes with plotMA(res). And see the DESeq2 paper for more details on the meaning of this threshold test.
    Last edited by Michael Love; 04-28-2015, 11:12 AM.

    Comment


    • #3
      Hi Michael,


      Thank you very much for your quick reply!!!


      Originally posted by Michael Love View Post
      In the latest DESeq2 version, 1.8, I have some new instructions for users who might have a difficult time setting up such contrasts.

      Can you look over the first code chunk in Section 3.3 Interactions:

      The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


      This makes comparisons of condition effects in each group much easier to perform.
      => This documentation is exactly what I needed! It makes things so much clearer... I'll upgrade our version of R, Bioconductor and DESeq2 just to be sure and I'll try this out. Thank you!



      Originally posted by Michael Love View Post
      For your other question:

      "what are the genes that are differentially expressed between Stimulated and Control in populations A and B BUT NOT differentially expressed between Stimulated and Control in population C?"

      There is not a way to build such a set with one results() call. But you can build each table individually and then look at the union() of the first two, setdiff() the last table.

      The last table can be constructed by specifying a threshold for LFC, say log2 fold change < 2, and then using lfcThreshold=2 and altHypothesis="lessAbs". You need to come up with a threshold value which makes sense for your system, and you have to turn off betaPrior for this comparison. I'd recommend inspecting how this changes with plotMA(res). And see the DESeq2 paper for more details on the meaning of this threshold test.
      => This is essentially what I had in mind, but doing it within DESeq2 rather than downstream is certainly much better from a statistical point of view. I'll try this as soon as possible!


      Thank you again for your help! I may have further questions later, we'll see...

      Best regards,

      -- Alex

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X