Hello,
I have a couple questions regarding biological replicates in DESeq. I have HTSeq output files from RNA-seq data examining the effects of three chemicals on gene expression. There are two experiments for this data. One is examining the expression changes compared to a vehicle control (1%) at a high concentration of chemical (10uM) and the other experiment is examining gene expression changes at a lower concentration (1uM) of the chemicals at a lower vehicl concentration (0.1%). Currently, I have been running DESeq on the two experiments separately, i.e. two separate R codes for each experimental setup so the design variable contains 4 conditions corresponding to their respective HTSeq output files for each experiment (hopefully this all makes sense).
My first question is whether I should run DESeq on the experiments combined instead of keeping them separate. In this case I would have a design variable contain all 8 conditions. My reasoning (potentially naive reasoning) about combining the two experiments is that I would better estimate the overall gene dispersions for all genes examined and yet still be able to run the 'nbinomTest()' normally if I define the conditions correctly. (Maybe I’m getting confused in the vignette’s definition of condition and factor?)
My second question is with regards to outliers as identified by the PCA plot function of DESeq. I have generated PCA plots for both experiments (keeping them separate) in order to see whether the treatments group together and the general pattern of the data. For both the high concentration and low concentration experiments the PCA plots show that some of the replicates differ rather substantially from their respective treatment groups (see images). Now, I know that removing outliers from analyses is risky business and needs to be justified, but based on how different these replicates are from the treatments would it be ok to take these out?
Thanks for all the help!!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Gabriela
yes, DESeq supports GLMs of any type. See also Simon's earlier posts.
Leave a comment:
-
This is indeed an important point john_nl; my impression is that estimating dispersion for only the two levels you're going to compare is a bit cheating on statistics... One would expect the dispersion to be calculated on all the condition levels, and then perform an ANOVA with contrasts... Does DESeq support this?
Leave a comment:
-
Will the comparison between the full model and the reduced model (only intercept) give the overall significance of time effect?
dfit1 <- fitNbinomGLMs(d, count ~ condition)
dfit0 <- fitNbinomGLMs(d, count ~ 1)
dpval <- nbinomGLMTest(dfit1, dfit0)
dpadj <- p.adjust(dpval, method="BH")
Leave a comment:
-
I have a similar experimental set-up as above and therefore face the same decision. Essentially the question is, assuming > 2 samples (comparisons) should the variance estimation (estimateDispersions) be performed using ALL of the samples before performing the pairwise DE test, or should the variance estimation be restricted to the pair of samples that one is testing for DE?
Cheers,
Leave a comment:
-
I was just going to make a thread on a similar vein, so may as well ask my question in this one.
Also dealing with a subset of pairwise comparisons in an analysis, and hte correct way to run it with DESeq.
Say you have a time course analysis with 3 bioligical replicates collected from 6 different time points. The comparisons we are interested it looking at is how all of the time points are different compared to time 1.
So 5 different pairwise tests: t1 vs t2, t1 vs t3, t1 vs t4, t1 vs t5, and t1 vs t6.
So Simon, the appropriate way to run this analysis using your DESeq would be to have them all in one count data set and then just run 5 different nbinomTests with (cds, "t1", "t2"), (cds, "t1", t3")... etc? Or taking the raw counts for "t1" and "t2", putting them in their own table, and creating / testing a count data set for each pair?
In addition, since we are doing multiple tests on the same data set, is there a need to re-do the False Discovery Rate calculation by combining the raw p-values from all 5 pairwise tests into a single list, and re-running p.adjust on the full set of results? Or is keeping the FDR values for each individual test acceptable?
Leave a comment:
-
Thanks, Simon. By subsetting, I assume you mean to simply run a number of nbinomTest commands, one for each comparison, using the same countDataSet (after dispersion estimation). For example:
Code:design <- data.frame( sample.names = sampleTable$V1, count.files = sampleTable$V2, condition = c("A", "A", "A", "B", "B", "B", "C", "C", "C") ) cds <- newCountDataSetFromHTSeqCount(design, directory="/data/dir") cds <- estimateSizeFactors( cds ) cds <- estimateDispersions( cds ) AvsB <- nbinomTest(cds, "A", "B") AvsC <- nbinomTest(cds, "A", "C") BvsC <- nbinomTest(cds, "B", "C")
Leave a comment:
-
For pair-wise comparisons, you have to subset your data set to only the samples involved. To be consistent with the ANOVA-style result for all levels, you should do the subsetting after the dispersion estimation.
Leave a comment:
-
Hi Simon,
Can you please clarify this for me? If I have more than one factor e.g. treatment and timepoint, i use the GLM full model approach. If I have only one factor, but it has more than two levels (A, B, C), should I still use the GLM approach? Or is it better to use the simpler model and do nbinomTest several times for each 2-way comparison (A vs B; A vs C; B vs C)? Is there a way to use the simpler model, but also perform the differential expression in one step (e.g. anova, especially for many-level factors)?
Many thanks for all your work on DESeq!
Matt
Leave a comment:
-
Sure, you can analyse more complex design. See the section on GLMs in the vignette. How precisely to set up the test depends on what hypothesis you want to test.
Leave a comment:
-
DESeq: more than 2 levels per condition?
Hi,
Is it possible in DESeq to analyze a design with more than 2 levels per condition/factor?
I'm working with a design, that has 3 different treatments (untreated, treatment1, treatment2) at several time points (I also have replicates of all of them):
treatment: time:
untreated 0h
untreated 24h
untreated 48h
treatment1 0h
treatment1 24h
treatment1 48h
treatment2 0h
treatment2 24h
treatment2 48h
Thanks in advance,
ElenaLast edited by edue; 11-18-2011, 05:23 AM.Tags: None
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: