Seqanswers Leaderboard Ad

**sdriscoll** · 04-26-2012, 11:49 AM

Originally posted by coutellec View Post

Second, as I only have this factor with two levels and no replicate, I did 2 separate nbinomTest analyses, i.e., on the whole dataset, and on the filtered one, respectively, and compared the corresponding adjusted p values. I was surprised to get different results.

the answer to this one is pretty straightforward. p-value correction for multiple testing is almost directly proportional to the number of tests (in this case the number of genes in the DE test). so if you used two lists with different numbers of genes you should expect to see different adjusted p-values.

in regard to p-values for a 1 sample verses 1 sample test these are a bit of a gift. it's not a simple task (and possibly not even a logical request) to produce a p-value for a N of 1 test. when I run 1 vs 1 DE tests I take those p-values pretty lightly. What's best is to look at the log2 fold changes and see what sort of genes come up to the top of the list. it's a lot of work but to find those with biological significance you're going to have to spend time going through that list anyways.

**coutellec** · 04-27-2012, 12:05 AM

Thank you very much for your answers. I will look at the highest foldchange values and try to get something out of them.

Actually, the difference that worried me was between the two procedures :
1. model comparison using "nbinomGLMTest" (the two models compared are based on full and filtered data, respectively)

vs

2. two independant tests using "nbinomTest", on full and filtered data, respectively.

Any, it seems related to the size factors, and this could be easily checked.
Thanks again,
Agnès

**Wolfgang Huber** · 04-27-2012, 12:19 AM

Originally posted by coutellec View Post

1. I think I do not understand clearly how to decide on the proportion of genes to be filtered (0.4 in the example). What is it based on exactly ?

Dear Agnes

Independent filtering is intended to remove those genes whose counts overall (throughout the dataset, irrespective of class label) are so small that they would have a negligible chance of being detected as differentially expressed. What this fraction is depends on your data. Have a look at the article by Bourgon et al. for explanation: http://www.pnas.org/content/107/21/9546.long
You might also find the diagnostic plots shown in Section Independent filtering in the DESeq vignette useful.

Originally posted by coutellec View Post

2. I used two ways to do what I think is equivalent in terms of analysis, yet obtained (slightly) different results. First, I applied model comparison nbinomGLMTest (counts~factor condition vs counts~1) to get adjusted p-values corresponding to the difference between conditions. I did that for the whole dataset and for the filtered one, and compared results (in terms of padj<.1).

It is not equivalent. See sdriscoll's reply. The whole point of independent filtering is to alleviate the multiple testing problem and to increase experiment-wide power.

OTOH, as sdriscoll also points out, with no replicates, the p-values are necessarily based on model assumptions which need not have much to do with reality. You really need replication so that DESeq has a chance to estimate the real biological variability in your data.

Originally posted by coutellec View Post

3. As I got only 3 extra DE genes thanks to filtering, I am wondering if dispersion was estimated correctly with:
cdstot<-estimateDispersions(cdstot, method="blind", sharingMode="fit-only", fitType="local")
Pvalues before BH adjustment were already much higher than with a Fisher exact test, and this surprised me a little.

Please explain what you mean by 'correctly'. There is no way any method can estimate the true dispersion from data with no replicates. You really need replication before going into any serious discussion of whether dispersion estimates are good enough for the task at hand.

**coutellec** · 04-27-2012, 12:33 AM

Thanks for explanations. The dataset is indeed very rich in low count genes (454 pyrosequencing), and I might have to filter a lot of them.
Regarding dispersion estimation without replicate, I understand we have to treat samples as if they were replicates. By "correctly", I meant that I simply followed the procedure of DESeq for data with no replicates, and I thought there might be alternatives to blind and local fit options (smthg like a hope it could improve the outcome). Anyway, I fully agree on the critical need for replication (and don't believe in miracles...).
Thanks again,
Agnès

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

DESeq and independant filtering

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News