Announcement

Collapse
No announcement yet.

time course,multi groups with DESeq2

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • time course,multi groups with DESeq2

    Hi All
    I am currently working on a set of RNAseq data which contain 4 genotypes (A, B, C, D) and 5 time points. I'd like to compare gene expression across all time points between 4 genotypes. Differential expressed genes accross 5 time points will be identified based on p values. However, the LRT considered full data and gives one padj. I also tried the contrast function between genotype A and B, A and C for example. A similar question was asked previously by someone else @ https://www.biostars.org/p/119342/. Looks like LRT is for testing all levels at once EVEN contrast is between A and B or A and C or A and D.

    below is my design and some codes:
    design = ~ genotype + timepoint + genotype:timepoint
    > dds_time <- DESeq(dds, test = "LRT", reduced = ~ genotype + timepoint)
    > res_time <- results(dds_time)

    #CONTRAST_RESULTS
    > results_A_B <- results(dds_time, contrast=c("genotype","A","B"))
    > results_A_C <- results(dds_time, contrast=c("genotype","A","C"))


    After reading some other posts and manuals, looks like I have two options:
    1, using the current analysis but recalculate p values using Wald test.
    I did this:
    > dds_wald <- nbinomWaldTest(dds_time, betaPrior=FALSE)
    found results columns, replacing these
    > results_A_B_wald <- results(dds_time, contrast=c("genotype","A","B"))
    > results_A_C_wald <- results(dds_time, contrast=c("genotype","A","C"))

    padj values for specific comparisons were given using this method.

    2, rerun pair-wise analysis (6 comparisons) using LRT separately?

    Which option is better and why?
    Am I correct if I want to identify the significant differences between genotypes across all time points?
    Another question is that is it possible to cluster the differential expressed genes (e.g. A vs B) and plot different clusters using DESeq2?
    Thanks,

  • #2
    hi jutos,

    Just to make sure I get it right, do you want to find (A) genes which show time-specific differences between two genotypes or (B) genes which are different between two genotypes, but not necessarily in a time-specific way?

    Imagine two lines which have a sinusoidal pattern, moving up and down in parallel (≈). If these are the time course gene expression of two genotypes, then these would have a small p-value for (B) but not for (A).

    Can you print out as.data.frame(colData(dds))? also packageVersion("DESeq2")

    Comment


    • #3
      Hi Michael
      Thanks for your response! Not sure if I understand correctly.
      (A) indicates gene differences at a specific timepoint.
      E.g. gene1.timepoint1 in genotype A vs gene1.timepoint1 genotype B.

      If (B) means gene1 in A vs gene1 in B across all timepoints, then that's what I want. I'd like to see a pattern but not specific timepoint.

      Yes, basically I am comparing two lines(significantly different or not). Sinusoidal pattern is one case. Other cases like, gene1 moving up in A but going down in B (<) or gene1 moving up in A but no change in B. etc.

      >as.data.frame(colData(dds_time))
      genotype timepoint sizeFactor
      A_t1_rep1.ht A 1 1.1361501
      A_t1_rep2.ht A 1 0.8687581
      A_t2_rep1.ht A 2 1.2196618
      A_t2_rep2.ht A 2 1.1299548
      A_t3_rep1.ht A 3 1.050075
      A_t3_rep2.ht A 3 0.9519595
      A_t4_rep1.ht A 4 0.8285208
      A_t4_rep2.ht A 4 1.0376003
      A_t5_rep1.ht A 5 0.6985669
      A_t5_rep2.ht A 5 0.9170495
      B_t1_rep1.ht B 1 1.1789369
      B_t1_rep2.ht B 1 0.9942072
      B_t2_rep1.ht B 2 1.1213339
      B_t2_rep2.ht B 2 1.0135584
      B_t3_rep1.ht B 3 1.1813726
      B_t3_rep2.ht B 3 1.080713
      B_t4_rep1.ht B 4 1.1646828
      B_t4_rep2.ht B 4 1.2200219
      B_t5_rep1.ht B 5 0.9224539
      B_t5_rep2.ht B 5 0.8493717
      C_t1_rep1.ht C 1 1.0054339
      C_t1_rep2.ht C 1 1.1109099
      C_t2_rep1.ht C 2 1.0307073
      C_t2_rep2.ht C 2 1.1476691
      C_t3_rep1.ht C 3 1.1442486
      C_t3_rep2.ht C 3 1.3069118
      C_t4_rep1.ht C 4 0.977317
      C_t4_rep2.ht C 4 0.9845946
      C_t5_rep1.ht C 5 0.7378932
      C_t5_rep2.ht C 5 0.7934279
      D_t1_rep1.ht D 1 0.9474263
      D_t1_rep2.ht D 1 1.0351776
      D_t2_rep1.ht D 2 1.0540152
      D_t2_rep2.ht D 2 1.1222096
      D_t3_rep1.ht D 3 0.9024823
      D_t3_rep2.ht D 3 1.3581637
      D_t4_rep1.ht D 4 1.1031467
      D_t4_rep2.ht D 4 1.1892484
      D_t5_rep1.ht D 5 0.9206771
      D_t5_rep2.ht D 5 0.6865151

      > packageVersion("DESeq2")
      [1] ‘1.6.3’

      Comment


      • #4
        All the testing is gene by gene, so we can leave gene out of the mix. If there are global changes across time for all genes, these are normalized away inside the model by the size factor.

        For your test, suppose there is a gene where the expression is higher in genotype B than A, but the expression is flat across timepoints (the = symbol). Are you interested in this gene?

        Your example where expression goes up for one genotype and down in another (the < symbol), I would call looking for a "time-specific difference between genotypes".

        However, the example of lines moving in parallel (the ≈ symbol) this is not a time-specific difference between genotypes, but a difference in baseline expression, or what we would call a main effect.

        Of these cases, which are you interested in?

        Comment


        • #5
          Hi Michael

          I am mostly interested in what you call the "time-specific difference between genotypes".
          would LRT or wald test do the job?
          Thanks,

          Comment


          • #6
            The LRT will find all genes with time-specific differences:

            design(dds) = ~ genotype + timepoint + genotype:timepoint
            dds = DESeq(dds, test="LRT", reduced=~ genotype + timepoint)
            results(dds)

            Note that the LFC column only describes the fold changes from a single coefficient in the model, although the likelihood ratio test statistic and p-values are from removing all of the terms associated with genotype:timepoint from the design.

            See the workflow for an example of visualizing the results with heatmaps:

            http://bioconductor.org/help/workflows/rnaseqGene/#time

            Also there is a section of the vignette on clustering using rlog or VST.

            Comment


            • #7
              Hi Michael

              The workflow for time course works fine with two strains. I have two more questions regarding to the pvalues of LRT and wald test. sorry for not being a statistics person.
              1, in the example, padj is for differences between two strains. I was wondering what is biological meaning of the padj if there are 3 or more strains? is it accounting for any differences in all strains over all time points?

              2, wald test was used for specific time point in the example.
              How about the res_mut_vs_wt? Is it testing linear model for all time points between mut and wt?

              res_mut_vs_wt <- results(ddsTC, name="strain_mut_vs_wt", test="Wald")

              Thanks,

              Comment


              • #8
                "is it accounting for any differences in all strains over all time points?"

                Basically yes, the LRT for ~ genotype + timepoint + genotype:timepoint vs
                ~ genotype + timepoint gives p-values which are small if there are any differences for any strain at any time point (which are not explained by time differences which are shared across all genotypes or genotype differences which are present at the first time point).

                The Mut vs WT effect here is the difference which is present at the first time point.

                (Make sure that the levels(dds$timepoint) are in the correct order, and if not, you should refactor using factor() and specifying the levels.)

                Comment

                Working...
                X