No announcement yet.

time course,multi groups with DESeq2

  • Filter
  • Time
  • Show
Clear All
new posts

  • time course,multi groups with DESeq2

    Hi All
    I am currently working on a set of RNAseq data which contain 4 genotypes (A, B, C, D) and 5 time points. I'd like to compare gene expression across all time points between 4 genotypes. Differential expressed genes accross 5 time points will be identified based on p values. However, the LRT considered full data and gives one padj. I also tried the contrast function between genotype A and B, A and C for example. A similar question was asked previously by someone else @ Looks like LRT is for testing all levels at once EVEN contrast is between A and B or A and C or A and D.

    below is my design and some codes:
    design = ~ genotype + timepoint + genotype:timepoint
    > dds_time <- DESeq(dds, test = "LRT", reduced = ~ genotype + timepoint)
    > res_time <- results(dds_time)

    > results_A_B <- results(dds_time, contrast=c("genotype","A","B"))
    > results_A_C <- results(dds_time, contrast=c("genotype","A","C"))

    After reading some other posts and manuals, looks like I have two options:
    1, using the current analysis but recalculate p values using Wald test.
    I did this:
    > dds_wald <- nbinomWaldTest(dds_time, betaPrior=FALSE)
    found results columns, replacing these
    > results_A_B_wald <- results(dds_time, contrast=c("genotype","A","B"))
    > results_A_C_wald <- results(dds_time, contrast=c("genotype","A","C"))

    padj values for specific comparisons were given using this method.

    2, rerun pair-wise analysis (6 comparisons) using LRT separately?

    Which option is better and why?
    Am I correct if I want to identify the significant differences between genotypes across all time points?
    Another question is that is it possible to cluster the differential expressed genes (e.g. A vs B) and plot different clusters using DESeq2?

  • #2
    hi jutos,

    Just to make sure I get it right, do you want to find (A) genes which show time-specific differences between two genotypes or (B) genes which are different between two genotypes, but not necessarily in a time-specific way?

    Imagine two lines which have a sinusoidal pattern, moving up and down in parallel (≈). If these are the time course gene expression of two genotypes, then these would have a small p-value for (B) but not for (A).

    Can you print out also packageVersion("DESeq2")


    • #3
      Hi Michael
      Thanks for your response! Not sure if I understand correctly.
      (A) indicates gene differences at a specific timepoint.
      E.g. gene1.timepoint1 in genotype A vs gene1.timepoint1 genotype B.

      If (B) means gene1 in A vs gene1 in B across all timepoints, then that's what I want. I'd like to see a pattern but not specific timepoint.

      Yes, basically I am comparing two lines(significantly different or not). Sinusoidal pattern is one case. Other cases like, gene1 moving up in A but going down in B (<) or gene1 moving up in A but no change in B. etc.

      genotype timepoint sizeFactor A 1 1.1361501 A 1 0.8687581 A 2 1.2196618 A 2 1.1299548 A 3 1.050075 A 3 0.9519595 A 4 0.8285208 A 4 1.0376003 A 5 0.6985669 A 5 0.9170495 B 1 1.1789369 B 1 0.9942072 B 2 1.1213339 B 2 1.0135584 B 3 1.1813726 B 3 1.080713 B 4 1.1646828 B 4 1.2200219 B 5 0.9224539 B 5 0.8493717 C 1 1.0054339 C 1 1.1109099 C 2 1.0307073 C 2 1.1476691 C 3 1.1442486 C 3 1.3069118 C 4 0.977317 C 4 0.9845946 C 5 0.7378932 C 5 0.7934279 D 1 0.9474263 D 1 1.0351776 D 2 1.0540152 D 2 1.1222096 D 3 0.9024823 D 3 1.3581637 D 4 1.1031467 D 4 1.1892484 D 5 0.9206771 D 5 0.6865151

      > packageVersion("DESeq2")
      [1] ‘1.6.3’


      • #4
        All the testing is gene by gene, so we can leave gene out of the mix. If there are global changes across time for all genes, these are normalized away inside the model by the size factor.

        For your test, suppose there is a gene where the expression is higher in genotype B than A, but the expression is flat across timepoints (the = symbol). Are you interested in this gene?

        Your example where expression goes up for one genotype and down in another (the < symbol), I would call looking for a "time-specific difference between genotypes".

        However, the example of lines moving in parallel (the ≈ symbol) this is not a time-specific difference between genotypes, but a difference in baseline expression, or what we would call a main effect.

        Of these cases, which are you interested in?


        • #5
          Hi Michael

          I am mostly interested in what you call the "time-specific difference between genotypes".
          would LRT or wald test do the job?


          • #6
            The LRT will find all genes with time-specific differences:

            design(dds) = ~ genotype + timepoint + genotype:timepoint
            dds = DESeq(dds, test="LRT", reduced=~ genotype + timepoint)

            Note that the LFC column only describes the fold changes from a single coefficient in the model, although the likelihood ratio test statistic and p-values are from removing all of the terms associated with genotype:timepoint from the design.

            See the workflow for an example of visualizing the results with heatmaps:


            Also there is a section of the vignette on clustering using rlog or VST.


            • #7
              Hi Michael

              The workflow for time course works fine with two strains. I have two more questions regarding to the pvalues of LRT and wald test. sorry for not being a statistics person.
              1, in the example, padj is for differences between two strains. I was wondering what is biological meaning of the padj if there are 3 or more strains? is it accounting for any differences in all strains over all time points?

              2, wald test was used for specific time point in the example.
              How about the res_mut_vs_wt? Is it testing linear model for all time points between mut and wt?

              res_mut_vs_wt <- results(ddsTC, name="strain_mut_vs_wt", test="Wald")



              • #8
                "is it accounting for any differences in all strains over all time points?"

                Basically yes, the LRT for ~ genotype + timepoint + genotype:timepoint vs
                ~ genotype + timepoint gives p-values which are small if there are any differences for any strain at any time point (which are not explained by time differences which are shared across all genotypes or genotype differences which are present at the first time point).

                The Mut vs WT effect here is the difference which is present at the first time point.

                (Make sure that the levels(dds$timepoint) are in the correct order, and if not, you should refactor using factor() and specifying the levels.)