Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 Analysis of Time Series data and model comparison

    I have a question regarding defining and comparing models in DESeq2. My data are counts (index of abundance) for each taxon identified through fungal ITS high-throughput amplicon data from soils sampled seasonally under three different stand types.

    Experimental design:
    • randomized complete block - blocks=6; treatment=3 (standType = spruce, beech, oak) = 18 experimental units
    • sampling times = 5 seasons (fall_1, winter, spring, summer, fall_2)
    • total sample size = 90 samples

    I am interested in looking at the response of species over time in my treatments with the goal of identifying species that show time dependency differences in counts among stand types (i.e. significant stand x time interactions). I have been following the “RNA-Seq workflow:gene-level exploratory analysis and differential expression” vignette (http://www.bioconductor.org/help/workflows/rnaseqGene/), specifically the section on “Time series experiments”.
    I used the two models below to test if the interaction of standType:Season was an important factor.

    m1 (full) ~standType+ standType:Season
    m2 (reduced) ~ standType

    This is a relatively similar set up to the vignette and my understanding is that since the only difference between the two models is the interaction term, species with low p values in the Likelihood Ratio Test results table are the species that show stand-specific effects in time. Correct?

    I can pull out the species that show different patterns over time among the three treatments by querying all the contrast via standType and comparing the species with significant padj values between treatments. From here I can visualize patterns of any particular species by plotting the counts using code in the vignette.

    This seems good so far, but after reading more on setting up models, I realize I have not properly defined my models. Since I have repeatedly sampled the same 18 sites five different times, I wonder if
    • m1 (full) should be defined as a repeated measures design keeping samples = 90 which would more properly account for the non-independence of samples collected at the same site.
    • and m2 (reduced) should identify samples from the same site as replicates collapsing the 90 samples to = 18 to avoid pseudoreplication and associated concerns about artificial inflation of statistical power.

    It seems that the model statements should be structured to ensure the data are processed as repeated measures of 18 experimental units not as 90 independent samples. Am I correct in being concerned about this? If so, should I be looking into how to specify repeated measures model statements for m1 and m2?

  • #2
    Your initial design doesn't include Season in the full and reduced, which is pretty important (we have this in our rnaseqGene example as 'time'). This term keeps track of differences over season, while the interaction term keeps track of differences over season which are specific for individual standTypes.

    re: the distinct sites, there is a useful trick for specifying such a design (originally from the edgeR user guide). I posted this on the Bioc support site here: https://support.bioconductor.org/p/62357/#62368 Take a look at that post to get the idea. Your model should take into account the 18 sites, by including a new variable 'nested.site' which keeps track of the 6 sites within each standType. This will be a column of numbers 1-6 identifying the different sites within each standType, and tracking these across all seasons.

    You can then use a design (corrected):

    Code:
    ~ season + standType + standType:nested.site + season:standType
    The reduced model would only remove season:standType, in order to find genes where there are standType specific differences across season.
    Last edited by Michael Love; 02-06-2015, 08:31 AM.

    Comment


    • #3
      Thanks for the reply and advice on model design!

      I didn’t originally include season in my model as I did not set out to specifically test for seasonal effect and communities group strongly to standType. I was thinking that leaving out the term would just mean the variation caused by season would be attributed to the unexplained variation, but maybe this is not fully correct.

      I ran the models as describe above using a data column with siteID (1-18) for the information of nested.site but ran into this error:
      litDEnf_seas1<-DESeqDataSet(litternDEf, ~ standType + season + season:siteID + season:standTtye)

      Error in DESeqDataSet(litternDEf, ~standType + season + season:siteID + season:standType) :
      the model matrix is not full rank, so the model cannot be fit as specified.
      one or more variables or interaction terms in the design formula
      are linear combinations of the others and must be removed


      Shoko also notes this issue here: https://support.bioconductor.org/p/63134/ where it seemed to be a result of missing samples.

      Quote: "You are encountering this error, because you have missing samples dispersed in the cross of treatment and time"

      I cannot see any missing sample/treatment combinations but it is expected that a species may not be present in all of the 90 samples (or in this model, in all 18 sites or in all 5 seasons). Would species absences at a given site in a given season cause the error? The error only occurs when I add the interaction term season:siteID into the model.

      Thanks in advance

      Comment


      • #4
        Can you show the full column data, including the new siteID columns?

        Comment


        • #5
          Hi,
          the data are attached in a .txt file.

          Thanks,

          Barbara
          Attached Files

          Comment


          • #6
            So you don't have the same problem as Shoko.

            Go back and check the Bioc support link I posted.

            You need to create a new column, nested.site, which should only be values "site_1" to "site_6". These keep track of the different sites within each stand type.

            Here is a small example of how it should look:

            spruce site_1 fall1
            spruce site_2 fall1
            spruce site_3 fall1
            ...
            beech site_1 fall1
            beech site_2 fall1
            beech site_3 fall1
            ...
            oak site_1 fall1
            oak site_2 fall1
            oak site_3 fall1

            It's not a problem that spruce site_1 is not the same site as oak site_1, because we use an interaction term standType:nested.site, which produces unique coefficients in the model for each combination of stand and site.

            Comment


            • #7
              Okay,I see the issue. I just ran the model with the new column data and it ran without error. Thanks for the help and prompt reply!

              Comment


              • #8
                Originally posted by Michael Love View Post
                So you don't have the same problem as Shoko.

                Go back and check the Bioc support link I posted.

                You need to create a new column, nested.site, which should only be values "site_1" to "site_6". These keep track of the different sites within each stand type.

                Here is a small example of how it should look:

                spruce site_1 fall1
                spruce site_2 fall1
                spruce site_3 fall1
                ...
                beech site_1 fall1
                beech site_2 fall1
                beech site_3 fall1
                ...
                oak site_1 fall1
                oak site_2 fall1
                oak site_3 fall1

                It's not a problem that spruce site_1 is not the same site as oak site_1, because we use an interaction term standType:nested.site, which produces unique coefficients in the model for each combination of stand and site.
                I have a longitudinal study where individuals are nested within diet, and also there may be a batch processing effect (samples processed in two batches).

                Would I make two of these new id columns, one for individuals within diet, and one for individuals within batch?

                Thanks!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X