I have been using DESeq to analyze gene expression from SAGE samples. To decide how to compare samples we have been using ECDF (empirical cumulative data function) plots to determine the quality of samples. I was wondering If I could transform this data into a quantitative number by taking the integral of the ECD function. I havn't yet discovered a way to do this in DESeq, is there a better program to analyze with?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
A few weeks ago, we have completely rewritten the DESeq vignette (manual). One of our changes was to remove everything about this ECDF plot of the variance residuals, as people kept misunderstanding its purpose (which was maybe never that clear anyway.) It is not to check quality of samples.
The point of the variance residual ECDF plots was to check whether the assumption holds well that genes of similar expression strength have similar variance, because the old DESeq version did not deal well with "variance outliers", i.e., genes with variance much stronger than similar genes. See the new vignette to learn how we now simply take the maximum of fitted value and per-gene estimate to avoid making an error here.
To judge the reproducibility of a protocol, i.e., the similarity of replicate samples, I now
recommend the following two possibilities:
(i) use the new 'estimateDispersions' function that now, by default, no longer does a local fit but a parametric fit, fitting a curve alpha = alpha_0 + alpha_1/mu on the dispersion alpha, or equivalently, a curve v = ( 1 + alpha_1 ) * mu + alpha_0 * mu^2 on the variance v. The value alpha_0 is a good measure of the overall (intensity-independent) variation between replicates, the value alpha_1 is a measure of the additional variance for weak genes. See vignette for details.
(ii) use the variance-stabilizing transformation to make a sample-clustering heatmap, as described in the vignette, to see whether your replicates are more similar than samples from different treatment groups.
Note that the new DESeq is available in the devel branch, not yet in the release branch, of Bioconductor
-
Hello Simon,
the "Package Downloads" links on the Bioconductor homepage (http://www.bioconductor.org/packages...tml/DESeq.html) are wrong. They still link to version 1.5.18 but should link to 1.5.19. Don't know wether you have any control over that.
Best,
Mark Onyango
Comment
-
Hello Simon,
could you please elaborate on why you switched from the local fit to a parametric fit as a default setting? I always found your idea for a more data-driven fit very sound.
@KellerMac:
It depends on what operating system you are using. If you use Windows you can safely install the development version parallel to the release version as it will also create a new library folder. So the two do not interfere.
If you are using Linux (e.g. Ubuntu) you simply download the development sources of R into a folder of your choosing and compile it there. It won't be installed system-wide and can be started from that folder. All packages downloaded will be kept in that folder as well.
So all in all there is no need to delete the current version of DESeq from you PC.
Comment
-
Error: could not find function "estimateDispersion"
I'm getting:
Error: could not find function "estimateDispersion"
What have I done wrong?
I'm running R in OSX. I've had no trouble using DEseq before, just this new function.
As far as I can tell, my DEseq is up to date
Comment
-
Ok, last one, it seems something is wrong with the files linked in bioconductor:
The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.
Am I wrong?
Comment
-
I am sorry to awaken this thread but I seem to have a problem with the latest Relase-Version of DESeq (1.6.1):
Whenever I try to execute the estimateDispersions function I receive the following error:
Parametric dispersion fit failed. Try a local fit and/or a pooled estimation. (See '?estimateDispersions')
Now this can only happen if the coefficients during the fitting process become negative (or at least some of them). Using the local fit kind of cures this but I still see some negative dispersion coefficients. My question therefor is: How can the coefficients become negative during fitting and how do I properly handle or interpret these?
Comment
-
The problem with the fit has little to do with the negative values, because DESeq "lifts" all negative dispersion values to something slightly above zero. Rather, our new parametric fit routine still has some weaknesses that we are not yet fully sure how to straighten out. This is why the package recommends reverting to the old method if the new one fails. In practice, the difference between the two methods turned out to be not that large, anyway.
To nevertheless explain the negative values: A random variable that is distributed according to a negative binomial with mean µ and dispersion a has variance v = µ + a µ². DESeq estimates a from the data with a method-of-moments estimator, i.e., it estimates µ and v and then calculated a = (v - µ ) / µ². (I'm skipping here over a few subtleties, explained in the supplement to our paper.) Especially for low µ, it may happen that the estimate for v is larger than that for µ, and the, the estimate for the dispersion a becomes negative. On the one hand, we know that a should be positive, and hence, we need to replace all negative values with small positive ones before the test. However, I prefer to do this only after the fit, as it introduces a positive bias.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
Today, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
56 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
Comment