Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: more than 2 levels per condition?


    Is it possible in DESeq to analyze a design with more than 2 levels per condition/factor?
    I'm working with a design, that has 3 different treatments (untreated, treatment1, treatment2) at several time points (I also have replicates of all of them):

    treatment: time:

    untreated 0h
    untreated 24h
    untreated 48h
    treatment1 0h
    treatment1 24h
    treatment1 48h
    treatment2 0h
    treatment2 24h
    treatment2 48h

    Thanks in advance,
    Last edited by edue; 11-18-2011, 05:23 AM.

  • #2
    Sure, you can analyse more complex design. See the section on GLMs in the vignette. How precisely to set up the test depends on what hypothesis you want to test.


    • #3
      Hi Simon,
      Can you please clarify this for me? If I have more than one factor e.g. treatment and timepoint, i use the GLM full model approach. If I have only one factor, but it has more than two levels (A, B, C), should I still use the GLM approach? Or is it better to use the simpler model and do nbinomTest several times for each 2-way comparison (A vs B; A vs C; B vs C)? Is there a way to use the simpler model, but also perform the differential expression in one step (e.g. anova, especially for many-level factors)?
      Many thanks for all your work on DESeq!


      • #4
        For pair-wise comparisons, you have to subset your data set to only the samples involved. To be consistent with the ANOVA-style result for all levels, you should do the subsetting after the dispersion estimation.


        • #5
          Thanks, Simon. By subsetting, I assume you mean to simply run a number of nbinomTest commands, one for each comparison, using the same countDataSet (after dispersion estimation). For example:

          design <- data.frame(
          	sample.names = sampleTable$V1,
          	count.files = sampleTable$V2,
          	condition = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
          cds <- newCountDataSetFromHTSeqCount(design, directory="/data/dir")
          cds <- estimateSizeFactors( cds )
          cds <- estimateDispersions( cds )
          AvsB <- nbinomTest(cds, "A", "B")
          AvsC <- nbinomTest(cds, "A", "C")
          BvsC <- nbinomTest(cds, "B", "C")


          • #6
            I was just going to make a thread on a similar vein, so may as well ask my question in this one.

            Also dealing with a subset of pairwise comparisons in an analysis, and hte correct way to run it with DESeq.

            Say you have a time course analysis with 3 bioligical replicates collected from 6 different time points. The comparisons we are interested it looking at is how all of the time points are different compared to time 1.

            So 5 different pairwise tests: t1 vs t2, t1 vs t3, t1 vs t4, t1 vs t5, and t1 vs t6.

            So Simon, the appropriate way to run this analysis using your DESeq would be to have them all in one count data set and then just run 5 different nbinomTests with (cds, "t1", "t2"), (cds, "t1", t3")... etc? Or taking the raw counts for "t1" and "t2", putting them in their own table, and creating / testing a count data set for each pair?

            In addition, since we are doing multiple tests on the same data set, is there a need to re-do the False Discovery Rate calculation by combining the raw p-values from all 5 pairwise tests into a single list, and re-running p.adjust on the full set of results? Or is keeping the FDR values for each individual test acceptable?


            • #7
              I have a similar experimental set-up as above and therefore face the same decision. Essentially the question is, assuming > 2 samples (comparisons) should the variance estimation (estimateDispersions) be performed using ALL of the samples before performing the pairwise DE test, or should the variance estimation be restricted to the pair of samples that one is testing for DE?



              • #8
                Will the comparison between the full model and the reduced model (only intercept) give the overall significance of time effect?

                dfit1 <- fitNbinomGLMs(d, count ~ condition)
                dfit0 <- fitNbinomGLMs(d, count ~ 1)
                dpval <- nbinomGLMTest(dfit1, dfit0)
                dpadj <- p.adjust(dpval, method="BH")


                • #9
                  This is indeed an important point john_nl; my impression is that estimating dispersion for only the two levels you're going to compare is a bit cheating on statistics... One would expect the dispersion to be calculated on all the condition levels, and then perform an ANOVA with contrasts... Does DESeq support this?


                  • #10
                    yes, DESeq supports GLMs of any type. See also Simon's earlier posts.
                    Wolfgang Huber


                    • #11

                      I have a couple questions regarding biological replicates in DESeq. I have HTSeq output files from RNA-seq data examining the effects of three chemicals on gene expression. There are two experiments for this data. One is examining the expression changes compared to a vehicle control (1%) at a high concentration of chemical (10uM) and the other experiment is examining gene expression changes at a lower concentration (1uM) of the chemicals at a lower vehicl concentration (0.1%). Currently, I have been running DESeq on the two experiments separately, i.e. two separate R codes for each experimental setup so the design variable contains 4 conditions corresponding to their respective HTSeq output files for each experiment (hopefully this all makes sense).
                      My first question is whether I should run DESeq on the experiments combined instead of keeping them separate. In this case I would have a design variable contain all 8 conditions. My reasoning (potentially naive reasoning) about combining the two experiments is that I would better estimate the overall gene dispersions for all genes examined and yet still be able to run the 'nbinomTest()' normally if I define the conditions correctly. (Maybe I’m getting confused in the vignette’s definition of condition and factor?)
                      My second question is with regards to outliers as identified by the PCA plot function of DESeq. I have generated PCA plots for both experiments (keeping them separate) in order to see whether the treatments group together and the general pattern of the data. For both the high concentration and low concentration experiments the PCA plots show that some of the replicates differ rather substantially from their respective treatment groups (see images). Now, I know that removing outliers from analyses is risky business and needs to be justified, but based on how different these replicates are from the treatments would it be ok to take these out?
                      Click image for larger version

Name:	Plot 1.png
Views:	1
Size:	7.2 KB
ID:	304295
                      Click image for larger version

Name:	Plot2.png
Views:	1
Size:	9.2 KB
ID:	304296

                      Thanks for all the help!!


                      Latest Articles


                      • seqadmin
                        Best Practices for Single-Cell Sequencing Analysis
                        by seqadmin

                        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                        06-06-2024, 07:15 AM
                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin

                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM





                      Topics Statistics Last Post
                      Started by seqadmin, 06-14-2024, 07:24 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 06-13-2024, 08:58 AM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 06-12-2024, 02:20 PM
                      0 responses
                      Last Post seqadmin  
                      Started by seqadmin, 06-07-2024, 06:58 AM
                      0 responses
                      Last Post seqadmin