Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • interactions with DESeq2 in a time-course analysis

    Dear all,

    we're having a time course data set of 8 TP (0, 16,24,30,48,72,90, and 100 hours).
    We would like to test for genes which are changing over time compared to the 0h (~ctrl).

    I have read the guide and a lot of posts here and on biostar.org. After understanding the way to create a design matrix for my data, I am still confused about the interactions in the design matrix.

    I have created the design matrix like that:

    Code:
    dds<-DESeqDataSetFromMatrix(countData=countTable, colData=phenotype, design= ~ replica + time )
    dds = DESeq(dds, test="LRT", reduced=~replica)
    Question 1:
    If I understand correctly, the results I get represents all genes with sig. diff. behaviour across all TP.
    But I get a lot of them. attached is [the plotMA of the counts].

    Code:
    > res
    log2 fold change (MLE): hours 100 vs 0 
    LRT p-value: '~ replica + hours' vs '~ replica' 
    DataFrame with 17558 rows and 6 columns
                   baseMean log2FoldChange     lfcSE      stat       pvalue         padj
                  <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
    FBgn0085804  0.18052802    -1.18153049  7.162547  1.518897 0.9816482138           NA
    ...
    Do all these genes change over time across all Time-points?

    Question 2:
    Does this list include all genes which are changing between only two time points?
    (that might be the reason, why the list is so long.)

    Question 3:
    what will change in term of my question (and of course the list of genes I will get), if I change my full and reduced model to this:
    Code:
    dds<-DESeqDataSetFromMatrix(countData=countTable, colData=phenotype, design= ~ replica + time +replica:time )
    dds = DESeq(dds, test="LRT", reduced=~replica +replica:time)

    Thanks in advance

    Assa
    Last edited by frymor; 11-05-2015, 03:21 AM.

  • #2
    1. Yup, you understood exactly. If you really want to be technical, what you're actually testing is whether including "hours" results in a better fit of the data...though the practical effect is asking for all genes changing over time. I should point out that you may not see all of these changes in direct pairwise comparisons (you'll probably see most of them though).

    2. Yup. This can be though of as a superset of the results from all pairwise comparisons. If it's ever DE in a pairwise comparison, it'll likely be DE in the LRT (the reverse isn't the case).

    Comment


    • #3
      Originally posted by dpryan View Post
      1. Yup, you understood exactly. If you really want to be technical, what you're actually testing is whether including "hours" results in a better fit of the data...though the practical effect is asking for all genes changing over time. I should point out that you may not see all of these changes in direct pairwise comparisons (you'll probably see most of them though).
      I didn't expect to see all the genes, when doing a pair-wise comparison, but I expect to see all of them as a subset (or subsets, if testing multiple time-point comparisons),as the LRT test supposedly tests for all DE genes over time.

      When checking for the pair-wise comparisons using the results() function, would it be better to keep using the LRT testing method, or would it be better to use the Wald test for a more robust statistical results.

      When comparing the two tests for a specific pair of time points, I can see a difference. I have read that the Wald test calculate the LFC shrinkage for the data while the takes multiple parameters into account.
      Code:
      > resTP16h_90hwald
      log2 fold change (MLE): hours 90 vs 16 
      Wald test p-value: hours 90 vs 16 
      DataFrame with 17558 rows and 6 columns
                     baseMean log2FoldChange     lfcSE       stat      pvalue        padj
                    <numeric>      <numeric> <numeric>  <numeric>   <numeric>   <numeric>
      FBgn0085804  0.18052802      -3.642448  6.899080 -0.5279614 0.597526110          NA
      FBgn0267431 19.73070118      -2.155084  1.270286 -1.6965346 0.089784690 0.125377361
      FBgn0039987  0.08559842      -2.183327  6.937402 -0.3147183 0.752975578          NA
      FBgn0058182  0.49195220      -2.710627  5.724264 -0.4735329 0.635833037          NA
      FBgn0267430 27.36264804      -4.362455  1.481378 -2.9448633 0.003230974 0.006781261
      ...                 ...            ...       ...        ...         ...         ...
      > resTP16h_90h
      log2 fold change (MLE): hours 90 vs 16 
      LRT p-value: '~ replica + hours' vs '~ replica' 
      DataFrame with 17558 rows and 6 columns
                     baseMean log2FoldChange     lfcSE      stat       pvalue         padj
                    <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
      FBgn0085804  0.18052802      -3.642448  6.899080  1.518897 0.9816482138           NA
      FBgn0267431 19.73070118      -2.155084  1.270286 17.423700 0.0148591771 0.0184819228
      FBgn0039987  0.08559842      -2.183327  6.937402  0.610748 0.9989315237           NA
      FBgn0058182  0.49195220      -2.710627  5.724264  3.607104 0.8237543782           NA
      FBgn0267430 27.36264804      -4.362455  1.481378 25.744205 0.0005595056 0.0007857541
      .
      Are the two method even comparable?

      Originally posted by dpryan View Post
      2. Yup. This can be though of as a superset of the results from all pairwise comparisons. If it's ever DE in a pairwise comparison, it'll likely be DE in the LRT (the reverse isn't the case).
      This is where I don't understand what happens. If a genes changes over time in the analysis across all time points, it must also be changed in at least one of the pair-wise comparisons. isn't that true?
      So why is the reverse not always the case?

      Comment


      • #4
        Originally posted by dpryan View Post
        2. Yup. This can be though of as a superset of the results from all pairwise comparisons. If it's ever DE in a pairwise comparison, it'll likely be DE in the LRT (the reverse isn't the case).
        When calculating the pair-wise comparisons of the timepoints, does it make more sense to add the parameter test="Wald", or can I keep the LRT results?

        I can see that there is a significant difference in the number of DE genes with an adjusted p-value <= 0.1

        Code:
        resTP16h_90h<- results(dds.filtered, contrast = c("hours", "90", "16"))
        resTP16h_90h.wald <- results(dds.filtered, test = "Wald", contrast = c("hours", "16", "90"))
        
        > addmargins(table(wald.test =(resTP16h_90h.wald$padj <.1), LRT.test=(resTP16h_90h$padj<.1)))
                 LRT.test
        wald.test FALSE  TRUE   Sum
            FALSE   624  2958  3582
            TRUE      0  9725  9725
            Sum     624 12683 13307
        Am I correct in assuming that the LRT test done in the first results command contains not only the genes differentiating between 90h and 16h, but also all the genes in the time-points between (24h,30h,48h and 72h)?
        Last edited by frymor; 11-12-2015, 02:27 AM.

        Comment


        • #5
          For pair-wise comparisons you need a Wald test. For "Is there a time effect, regardless of when?" you need an LRT. So yes, your assumption is absolutely correct

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Technologies
            by seqadmin







            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

            Long-Read Sequencing
            Long-read sequencing has...
            12-02-2024, 01:49 PM
          • seqadmin
            Genetic Variation in Immunogenetics and Antibody Diversity
            by seqadmin



            The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
            11-06-2024, 07:24 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-02-2024, 09:29 AM
          0 responses
          140 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-02-2024, 09:06 AM
          0 responses
          50 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-02-2024, 08:03 AM
          0 responses
          39 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-22-2024, 07:36 AM
          0 responses
          70 views
          0 likes
          Last Post seqadmin  
          Working...
          X