No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESEQ2 multiple factors + interaction analysis

    First, I want to emphasize that reading his forum was already really extremely helpful
    overcoming some of the issues I had initially grasping the concept of designs in Deseq2.
    Nevertheless I am stuck now and could need some help:

    I have the following experimental setup (each 3 replicates):

    condition: TREATMENT vs CONTROL
    tissue: A vs B
    genotype: MUTANT vs WT

    Now, I first wanted to find all the diferentially expressed genes for each of the condition (was asked by collaborator) , e.g.

    A vs B in the wild-type and treated
    A vs B in the wild-type and treated

    and so on (it is as well a good point to compare with already published data).

    For this I created groups to simplify the analysis e.g.


    and run the analysis with the simple design: ~group

    which gave me the following result names:

        [1] "Intercept"        "groupTREATMENT.WT.A" "groupTREATMENT.WT.B"....
    and so on.

    Which I could then compare then in the fashion:
    cond1 <- results(dds, contrast=list("groupTREATMENT.WT.A","groupTREATMENT.WT.B"))
    I am happy till here, but now comes the part which I am not sure how to analyze best.
    If I want to figure out the interaction
    • [1] tissue A vs B + WT vs MUTANT in TREATMENT
      [2] tissue A vs B + WT vs MUTANT in CONTROL
      [3] TREATMENT vs CONTROL + WT vs MUTANT in tissue A
      [4] TREATMENT vs CONTROL + WT vs MUTANT in tissue B

    I thought that I could first re-level the genotype to make the mutant the reference and get only the genes upregulated in the WT.

    dds$genotype <- relevel(dds$genotype, "MUTANT")
    For the design I thought the proper manner would be:

    For the 1st example

     ~tissue + genotype + condition + genotype:tissue
    The thought was that I get the interaction between the genotype and the condition and can control for the tissue. But I am somehow on the wrong track:
    [1] "Intercept"                  "tissue_A_vs_B"                "genotype_WT_vs_MUTANT"        "condition_TREATMENT_vs_CONTROL"     "tissueA.genotypeWT"
    How would I now extract for the previous described scenario (1 and 2) extract the list of diff. exp. genes with contrast?

  • #2

    If I understand your question, it sounds like you want to use a model with all interactions.

    You will have to turn off the LFC shrinkage (betaPrior=FALSE in the DESeq() call), as in a situation with two levels of interaction terms (first order interactions between two variables and second order interactions between three variables), shrinkage of effects becomes complicated, and we did not implement routines for this.

    By, "tissue A vs B + WT vs MUTANT in TREATMENT", do you mean, test for a difference in the interaction effect of tissue and genotype for the treatment group vs the control group?

    You should then use a design of ~ tissue*genotype*condition

    And this difference is tested with

    results(dds, name="tissueB.genotypeMUTANT.conditionTREATMENT")
    (This requires that you relevel so that A, WT, and CONTROL are base levels of the respective factors.)

    If you want to test the interaction of tissue and genotype specific for the treatment group, that would be the interaction effect of tissue and genotype for the control group and the difference in the interaction effect for the treatment group added together:

    results(dds, contrast=list(c("tissueB.genotypeMUTANT","tissueB.genotypeMUTANT.conditionTREATMENT")))

    For, "tissue A vs B + WT vs MUTANT in CONTROL", if you mean the interaction effect of tissue and genotype specific for the control group, this would be the effect

    results(dds, name="tissueB.genotypeMUTANT")
    For you to visualize, it might help to examine the model matrix:

    model.matrix(~ tissue*genotype*condition, colData(dds))
    which should be the same as the following, if you use betaPrior=FALSE:

    attr(dds, "modelMatrix")
    Last edited by Michael Love; 01-20-2015, 08:58 AM. Reason: clarifying


    • #3
      Thanks, that was helpful!
      Indeed I find sometimes the contrast list somehow confusing (syntax wise).
      Why would be

      test for an interaction of tissue and genotype specific for the treatment group
      --> tissueB.genotypeMUTANT.conditionTREATMENT


      interaction effect of tissue and genotype specific for the control group
      --> tissueB.genotypeMUTANT


      • #4

        I've tried to clarify the above text, adding in between a third results table, which might have been the one you are interested in. This is just the nature of interactions. Interactions are additional effects for the groups which are not the reference level (or "base level"). So the tissue:genotype interaction for the control group is just the first order interaction, while the tissue:genotype interaction for the treatment group is the first order interaction plus an additional effect.


        • #5
          I'm also struggling with the syntax of the design formula. I want to identify genes differentially expressed between two ancestries of two health statuses at each of 4 time points. I'm not sure if I should do two separate analyses or combine everything into one design:

          Ancestry: A or B
          Status: could be Control or Case
          Time_Point: 1,2,3,4

          I tried the following design to be able to compare each thing individually and also the interaction between any of the things:
          dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData3,design = ~ANCESTRY+Status+Time_Point + ANCESTRY:Time_Point + Status:Time_Point)

          dds <- DESeq(dds,parallel=TRUE)

          Which results in the following:
          [1] "Intercept"
          [2] "ANCESTRYA"
          [3] "ANCESTRYB"
          [4] "StatusControl"
          [5] "StatusCase"
          [6] "Time_Point1"
          [7] "Time_Point2"
          [8] "Time_Point3"
          [9] "Time_Point4"
          [10] "ANCESTRYA.Time_Point1"
          [11] "ANCESTRYB.Time_Point1"
          [12] "ANCESTRYA.Time_Point2"
          [13] "ANCESTRYB.Time_Point3"
          [14] "ANCESTRYA.Time_Point3"
          [15] "ANCESTRYB.Time_Point3"
          [16] "ANCESTRYA.Time_Point4"
          [17] "ANCESTRYB.Time_Point4"
          [18] "StatusControl.Time_Point1"
          [19] "StatusCase.Time_Point1"
          [20] "StatusControl.Time_Point2"
          [21] "StatusCase.Time_Point2"
          [22] "StatusControl.Time_Point3"
          [23] "StatusCase.Time_Point3"
          [24] "StatusControl.Time_Point4"
          [25] "StatusCase.Time_Point4"

          This way I am able to identify genes differentially expressed between one time point and another regardless of status or ancestry:
          Time1v_2<-results(dds, contrast=c("Time_Point", "Time_Point1", "Time_Point2"),parallel=TRUE)

          Time2v_3<-results(dds, contrast=c("Time_Point", "Time_Point2", "Time_Point3"),parallel=TRUE)

          I am also able to find genes differentially expressed between Status A and Status B regardless of time point or ancestry:
          ControlvCase<-results(dds, contrast=c("Status", "Control", "Case"),parallel=TRUE)

          Similarly for ancestry regardless of status or time point:
          AncestryAvB<-results(dds, contrast=c("Ancestry", "A", "B"),parallel=TRUE)

          But I get confused when I want to identify genes differentially expressed between Control and Case at each specific time point, or Ancestry A and Ancestry B at each time point. Is the following the correct syntax for such a comparison?

          For genes differentially expressed in cases and controls at a particular time:
          ControlTime2vCaseTime2<-results(dds, contrast=list(c("StatusControl.Time_Point2", "StatusCase.Time_Point2")),parallel=TRUE)

          For ancestry differences at a particular time:
          AncestryATime2vAncestryBTime2<-results(dds, contrast=list(c("ANCESTRYEA.Time_Point3_BCG_24h", "ANCESTRYAJ.Time_Point3_BCG_24h")),parallel=TRUE)

          When I run these commands, I get no error and a results table, but I want to make sure those results are for the comparison I actually want.

          I have also read about the time series option, but am unsure how it would differ from the above design.

          Many thanks for any feedback!


          Latest Articles


          • seqadmin
            Advanced Methods for the Detection of Infectious Disease
            by seqadmin

            The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
            11-27-2023, 01:15 PM
          • seqadmin
            Strategies for Investigating the Microbiome
            by seqadmin

            Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
            11-09-2023, 07:02 AM





          Topics Statistics Last Post
          Started by seqadmin, Today, 10:35 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 12-05-2023, 02:24 PM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 12-05-2023, 07:37 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 12-04-2023, 08:23 AM
          0 responses
          Last Post seqadmin