Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESEQ2 multiple factors + interaction analysis

    First, I want to emphasize that reading his forum was already really extremely helpful
    overcoming some of the issues I had initially grasping the concept of designs in Deseq2.
    Nevertheless I am stuck now and could need some help:

    I have the following experimental setup (each 3 replicates):

    condition: TREATMENT vs CONTROL
    tissue: A vs B
    genotype: MUTANT vs WT

    Now, I first wanted to find all the diferentially expressed genes for each of the condition (was asked by collaborator) , e.g.

    A vs B in the wild-type and treated
    A vs B in the wild-type and treated

    and so on (it is as well a good point to compare with already published data).

    For this I created groups to simplify the analysis e.g.


    and run the analysis with the simple design: ~group

    which gave me the following result names:

        [1] "Intercept"        "groupTREATMENT.WT.A" "groupTREATMENT.WT.B"....
    and so on.

    Which I could then compare then in the fashion:
    cond1 <- results(dds, contrast=list("groupTREATMENT.WT.A","groupTREATMENT.WT.B"))
    I am happy till here, but now comes the part which I am not sure how to analyze best.
    If I want to figure out the interaction
    • [1] tissue A vs B + WT vs MUTANT in TREATMENT
      [2] tissue A vs B + WT vs MUTANT in CONTROL
      [3] TREATMENT vs CONTROL + WT vs MUTANT in tissue A
      [4] TREATMENT vs CONTROL + WT vs MUTANT in tissue B

    I thought that I could first re-level the genotype to make the mutant the reference and get only the genes upregulated in the WT.

    dds$genotype <- relevel(dds$genotype, "MUTANT")
    For the design I thought the proper manner would be:

    For the 1st example

     ~tissue + genotype + condition + genotype:tissue
    The thought was that I get the interaction between the genotype and the condition and can control for the tissue. But I am somehow on the wrong track:
    [1] "Intercept"                  "tissue_A_vs_B"                "genotype_WT_vs_MUTANT"        "condition_TREATMENT_vs_CONTROL"     "tissueA.genotypeWT"
    How would I now extract for the previous described scenario (1 and 2) extract the list of diff. exp. genes with contrast?

  • #2

    If I understand your question, it sounds like you want to use a model with all interactions.

    You will have to turn off the LFC shrinkage (betaPrior=FALSE in the DESeq() call), as in a situation with two levels of interaction terms (first order interactions between two variables and second order interactions between three variables), shrinkage of effects becomes complicated, and we did not implement routines for this.

    By, "tissue A vs B + WT vs MUTANT in TREATMENT", do you mean, test for a difference in the interaction effect of tissue and genotype for the treatment group vs the control group?

    You should then use a design of ~ tissue*genotype*condition

    And this difference is tested with

    results(dds, name="tissueB.genotypeMUTANT.conditionTREATMENT")
    (This requires that you relevel so that A, WT, and CONTROL are base levels of the respective factors.)

    If you want to test the interaction of tissue and genotype specific for the treatment group, that would be the interaction effect of tissue and genotype for the control group and the difference in the interaction effect for the treatment group added together:

    results(dds, contrast=list(c("tissueB.genotypeMUTANT","tissueB.genotypeMUTANT.conditionTREATMENT")))

    For, "tissue A vs B + WT vs MUTANT in CONTROL", if you mean the interaction effect of tissue and genotype specific for the control group, this would be the effect

    results(dds, name="tissueB.genotypeMUTANT")
    For you to visualize, it might help to examine the model matrix:

    model.matrix(~ tissue*genotype*condition, colData(dds))
    which should be the same as the following, if you use betaPrior=FALSE:

    attr(dds, "modelMatrix")
    Last edited by Michael Love; 01-20-2015, 08:58 AM. Reason: clarifying


    • #3
      Thanks, that was helpful!
      Indeed I find sometimes the contrast list somehow confusing (syntax wise).
      Why would be

      test for an interaction of tissue and genotype specific for the treatment group
      --> tissueB.genotypeMUTANT.conditionTREATMENT


      interaction effect of tissue and genotype specific for the control group
      --> tissueB.genotypeMUTANT


      • #4

        I've tried to clarify the above text, adding in between a third results table, which might have been the one you are interested in. This is just the nature of interactions. Interactions are additional effects for the groups which are not the reference level (or "base level"). So the tissue:genotype interaction for the control group is just the first order interaction, while the tissue:genotype interaction for the treatment group is the first order interaction plus an additional effect.


        • #5
          I'm also struggling with the syntax of the design formula. I want to identify genes differentially expressed between two ancestries of two health statuses at each of 4 time points. I'm not sure if I should do two separate analyses or combine everything into one design:

          Ancestry: A or B
          Status: could be Control or Case
          Time_Point: 1,2,3,4

          I tried the following design to be able to compare each thing individually and also the interaction between any of the things:
          dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData3,design = ~ANCESTRY+Status+Time_Point + ANCESTRY:Time_Point + Status:Time_Point)

          dds <- DESeq(dds,parallel=TRUE)

          Which results in the following:
          [1] "Intercept"
          [2] "ANCESTRYA"
          [3] "ANCESTRYB"
          [4] "StatusControl"
          [5] "StatusCase"
          [6] "Time_Point1"
          [7] "Time_Point2"
          [8] "Time_Point3"
          [9] "Time_Point4"
          [10] "ANCESTRYA.Time_Point1"
          [11] "ANCESTRYB.Time_Point1"
          [12] "ANCESTRYA.Time_Point2"
          [13] "ANCESTRYB.Time_Point3"
          [14] "ANCESTRYA.Time_Point3"
          [15] "ANCESTRYB.Time_Point3"
          [16] "ANCESTRYA.Time_Point4"
          [17] "ANCESTRYB.Time_Point4"
          [18] "StatusControl.Time_Point1"
          [19] "StatusCase.Time_Point1"
          [20] "StatusControl.Time_Point2"
          [21] "StatusCase.Time_Point2"
          [22] "StatusControl.Time_Point3"
          [23] "StatusCase.Time_Point3"
          [24] "StatusControl.Time_Point4"
          [25] "StatusCase.Time_Point4"

          This way I am able to identify genes differentially expressed between one time point and another regardless of status or ancestry:
          Time1v_2<-results(dds, contrast=c("Time_Point", "Time_Point1", "Time_Point2"),parallel=TRUE)

          Time2v_3<-results(dds, contrast=c("Time_Point", "Time_Point2", "Time_Point3"),parallel=TRUE)

          I am also able to find genes differentially expressed between Status A and Status B regardless of time point or ancestry:
          ControlvCase<-results(dds, contrast=c("Status", "Control", "Case"),parallel=TRUE)

          Similarly for ancestry regardless of status or time point:
          AncestryAvB<-results(dds, contrast=c("Ancestry", "A", "B"),parallel=TRUE)

          But I get confused when I want to identify genes differentially expressed between Control and Case at each specific time point, or Ancestry A and Ancestry B at each time point. Is the following the correct syntax for such a comparison?

          For genes differentially expressed in cases and controls at a particular time:
          ControlTime2vCaseTime2<-results(dds, contrast=list(c("StatusControl.Time_Point2", "StatusCase.Time_Point2")),parallel=TRUE)

          For ancestry differences at a particular time:
          AncestryATime2vAncestryBTime2<-results(dds, contrast=list(c("ANCESTRYEA.Time_Point3_BCG_24h", "ANCESTRYAJ.Time_Point3_BCG_24h")),parallel=TRUE)

          When I run these commands, I get no error and a results table, but I want to make sure those results are for the comparison I actually want.

          I have also read about the time series option, but am unsure how it would differ from the above design.

          Many thanks for any feedback!


          Latest Articles


          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin

            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin

            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM





          Topics Statistics Last Post
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 09:45 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 08:54 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 07-02-2024, 03:00 PM
          0 responses
          Last Post seqadmin