Seqanswers Leaderboard Ad

**Simon Anders** · 06-27-2012, 11:56 PM

Hi Danielle

so, only 3 genes are affected? Anything different about the counts for these three genes compared to all the others? Maybe post an excerpt from the count table and from fData(cds).

Simon

**kristenbeck527** · 07-02-2012, 11:42 AM

Hi Simon,
I am the grad student working on this project. There are actually between 18-91 entries that are returning p-value = NA, depending on which data set we are working with. I have assembled a tarball with the script we are using and subset of mixed entries some of which give a p-val that is NA and some which are normal. Session information, input and output files are included as well. Take a look and let me know what you think or if you need any more info from me.

Thanks,
Kristen

Attached Files

DEseqTroubleshooting.tar.gz (12.8 KB, 30 views)

**Simon Anders** · 07-12-2012, 05:38 AM

Hi Kristen

I've had a look at the data now. (Sorry that it took a while.)

This does not look like RNA-Seq data. You only have 94 genes; this is only barely enough to fit a variance-mean relation.

Furthermore, your data is sub-Poissonean: the variance is smaller than the mean for more than half of the genes. This is impossible even with technical replicates. You must have done something wrong when obtaining your counts.

I'd say DESeq was justified in giving strange results here. You will need to tell me more about how the data was obtained before I can give further advice.

**dglemay** · 07-12-2012, 05:47 AM

Dear Simon,

Thank you for looking at our data. Apologies, we thought it would be easier for you if we gave you a piece of the data set showing the problem, rather than a full data set. Kristen can send a full data set.

Your detective work is spot-on, because these are data were obtained by "simulating" technical replicates based on a Poisson distribution.... so, some of the genes will have a variance smaller than the mean.

The NA values are troubling because these are genes with VERY different abundances, so it should be a slam-dunk for DESeq to detect them as differentially expressed.

Thanks,
Danielle

**Simon Anders** · 07-12-2012, 06:12 AM

The p values were not "NA" but "NaN", which usually results from a division of zero by zero. Here, it happened because the floating point machine precision turns out to be insufficient in case of extremely low dispersions combined with large means. I guess this can be fixed, but it's maybe not worth the effort because essentially zero dispersions do not happen in practice. (Even between technical replicates, you will usually find dispersion values above 10^-6.)

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

DESeq: pvalues are NA

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News