DESeq v1.12.0 estimateDispersions function: pooled-CR vs pooled

adumitri

Member

Join Date: Jan 2010

Posts: 27
- Share
- Tweet
#1

DESeq v1.12.0 estimateDispersions function: pooled-CR vs pooled

05-03-2013, 12:34 PM

Hi,

I am using DESeq v1.12.0 to compare count data for ~6,000 exons in ~200 genes; these genes are the ones found to be significantly differentially expressed (FDR level) in a prior gene-centric DESeq analysis between 20 diseased and 20 normal human RNA-Seq samples. To get the exon counts that DESeq needs as an input file, I used two python scripts that Simon Anders created: dexseq_prepare_annotation.py and dexseq_count.py.

In DESeq, everything went smoothly until the estimateDispersions function. Initially, I tried using the "pooled-CR" method for this function, but got the following error:

Code:

> cds <- estimateDispersions(cds, method = "pooled-CR") Error in parametricDispersionFit(means, disps) : Parametric dispersion fit failed. Try a local fit and/or a pooled estimation. (See '?estimateDispersions')

Given this message, I tried using the estimateDispersions function in two different ways:

Code:

cds.pooledCR.local <- estimateDispersions(cds, method = "pooled-CR", fitType="local") cds.pooled <- estimateDispersions(cds, method = "pooled")

When using the plotDispEsts function for the cds.pooledCR.local and cds.pooled objects to plot the mean of normalized counts vs the dispersion for all included exons, I obtain the attached plots. In the case of the cds.pooled object, this message was displayed:

Code:

Warning message: In xy.coords(x, y, xlabel, ylabel, log) : 1472 y values <= 0 omitted from logarithmic plot

The 1,472 mentioned y dispersion values are < 0. They are also the ones on the bottom of the plot obtained with the cds.pooledCR.local (in this case, the values were really small, but > 0).

Long introduction for my two questions:
1) I thought DESeq makes sure the dispersion values are all above 0. How should I interpret the negative dispersion values obtained when the "method" option for estimateDispersions is "pooled"?
2) Which options for the estimateDispersions function are safest to use with exon data?

Thank you for your help!
Alexandra
Attached Files

DESeq.dispersionEstimates.pooled.plot.png (38.1 KB, 54 views)

DESeq.dispersionEstimates.pooledCR-local.plot.png (35.9 KB, 53 views)
Tags: None
Simon Anders

Senior Member

Join Date: Feb 2010

Posts: 995
- Share
- Tweet
#2

05-03-2013, 03:23 PM

1. The left plot looks fine, the right one is completely off (fit line misses the main data cloud).

2. The negative dispersion values are an artifact from the the way how dispersions are estimated in all but the "CR" mode, namely by method of moments (see our paper). For the actual computation they get replaced by a small positive number.

3. I'm rarther puzzled why you would want to use DESeq rather then DEXSeq for your exons, but assuming that you know what you are doing: As long as the diagnostic plot is fine (red line goes through upper cloud), you can go ahead.

4. Overall, we are no longer that happy with DESeq's manner of estimating dispersions, which is why we made a fresh try with DESeq2. We are not yet finished, however, with writing up our paper to explain what exactly we now do different.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

DESeq v1.12.0 estimateDispersions function: pooled-CR vs pooled

Comment

Latest Articles

ad_right_rmr

News