Originally posted by Simon Anders
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Let's say you have four different condition, each with two replicates: A1, A2, B1, B2, C1, C2, D1, D2.
Then, you can use DESeq as it is to do contrasts such as 'A vs B', 'A vs C', 'B vs D', etc.
What does not yet work is interaction contrasts. So, let's say, you have the conditions wt, A, B, AB; where wt is the wild-type, A and B are some treatments or mutations, and AB is the combination of both. You now might either be interested in simple contrasts, as above, which is fine. Or you want to check for interactions, i.e., you say that the effect of A is given by the difference dA between A and wt (dA = A - wt) and similar for B (dB = B - wt) and AB (dAB = AB - wt), and you now wonder whether dAB = dA + dB. To check this, you need an interaction contrast, and this is something that I am currently working on, but I might still need a few weeks.
For now, you can use variance-stabilize the data (function 'getVarianceStabilizedData) and then use an ordinary linear model or the 'limma' package to check for interaction. That works fine but the power is reduced.
Just using the raw count data, scaled by the library sizes, is not a good idea, if you intend to feed it a linear model. Ordinary least squares regression requires homoscedasticity, and count data is heteroscedastic. The point of a variance-stabilizing transformation is precisely to remedy that.
Leave a comment:
-
Multi-sample comaprison
Simon,
Lets say I want to compare 4 samples each with 2 replicates (an ANOVA for example). As of now, DESeq doesn't have this functionality correct?
Is there a way I can export the entire matrix of normalized values rather than making the 1X1 comparison? For example, would it be correct to multiply all of the values in T1b by 0.5587394, or T2 by 1.5823096, etc? (these represent the column names ans sizeFactors from the DESeq vignette)
Leave a comment:
-
If you followed the example in the vignette, you have come across the line
resSig <- res[ res$padj < .1, ]
This is where we select the genes, which we want to consider significant. res$padj is the Benjamini-Hoch-berg-adjusted p value, and selecting all genes with padj below 0.1 controls the FDR at 10%. To get 1%, just put .01 here.
Leave a comment:
-
Hi Simon,
How can I set the FDR at 1% instead of 10%. Is there a way of doing this through the nbinomTest function? I am not an R programmer so I do not know.
Thanks
Abhijit
Leave a comment:
-
The same lines.
A false dicovery rate (FDR) of 10% means that (in expectation) 10% of your hits are false positives. To get FDR control at 10%, you adjust the raw p values for multiple testing with an FDR-controlling method such as the one by Benjamini and Hochber or the one by Storey, and the take all those genes with an adjusted p value below the desired FDR. DESeq does this for you, using the Benjamini-Hochberg method.
Controlling at FDR 10% is a common, but, of course, arbitrary choice. It is up to you to decide how stringent you want to be.
The concept of FDR has been introduced by Benjamini and Hochberg in 1995 (J Roy Stat Soc B 57 289). Since then, there has been a lot of research on how best to adjust p values to allow for FDR control, and how well this works in case of correlation.
Leave a comment:
-
Hi Simon,
Thank you for your help. I got DESeq to work. However, I am trying to figure out a good value to use for the FDR cutoff. This maybe a very stupid question, but I'll ask anyways. Is an FDR of 10% more stringent than an FDR of 1%. I am thinking in terms of p-values where lesser numbers are higher stringency. Should I imagine FDR along the same lines or the opposite.
Thanks
Abhijit
Leave a comment:
-
Dear Wolfgang,
Thank you for pointing out my typo- namely that "measurement" should have been "estimate".
Leave a comment:
-
Originally posted by lpachter View PostI'd like to clarify one point here:
...for a measurement to be useful (whatever units it is in) it is important to know its variance. ....
The choice of ensemble matters: repeated measurements on the same sample using the same machine will have smaller variance than repeated measurements using different machines, or different biological replicates, e.g. different cell passages.
Leave a comment:
-
Hi Abhijit
have you followed the instructions on http://www-huber.embl.de/users/anders/DESeq/ ?
If so, please send me the output from the installation. The real error message should be hidden in there.
Cheers
Simon
Leave a comment:
-
Hi Simon,
I am trying to install DESeq on MAC OSX Leopard. I get the error message "In getDependencies(pkgs, dependencies, available, lib) : package ‘DESeq’ is not available". Any clue why this is happening. I found that DESeq needs BioBase to be installed. I have done that, and yet I get this error. Any ideas?
Thanks
Abhijit
Leave a comment:
-
Originally posted by lpachter View PostThere is no such thing as "FPKM data" (or for that matter "RPKM data"). The only data in an RNA-Seq experiment is the read counts.
Leave a comment:
-
Originally posted by Cole Trapnell View PostOne more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.
Do you think it would be a good idea to use quantile normalization on tag counts before running cuffdiff or is this unecessary given that it didn't actually use RPKM values (which I didn't know, but am glad to hear)?
Leave a comment:
-
One more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:49 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 11:49 AM
|
||
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Leave a comment: