Hi everybody,

I know this problem to be discussed quite a lot. I read the posts here and here (and also the papers mentioned in them).

I have two questions concerning my experiment. One is about the experimental design, the second about how to set the filtering. I think they are both somehow connected, so I would like to place them in one post.

In my experiment we have three conditions (ctrl, KO1 and KO2) and three separate cell types ( I, P, and NP).
I would like to understand better how to analyse the data in one go.

The aim of the experiment is not only to compare the ctrl vs. KO1 and/or KO2, but also to analyse the efficiency of cellular processes by comparing NP vs. P in ctrl and/or KO1 and KO2.

I ran the analysis once with all genes (without any filtering at all, first!). I compared the ctrl vs. KO1 and KO2. It was interesting to see, that in all the comparisons of ctrl vs. KO1 I get a long list of significantly deregulated genes (FDR=0.1%), but in the comparison ctrl vs. KO2 I get only 2-5 genes.
So I thought a good explanation for that will be filtering the low-count genes. In search of a good cutoff I tried the genefilter package and got the following rank plot:

Q1: I was wondering if cutting the data set at 0.57 is a good decision.

Than I looked for a FDR value and did the rejection plot, to see how many genes I am left with, with each of the different FDR values.
It was interessting to see, that from 0%-50% they are all overlap each other.

Q2: Does that mean, that there is no difference between ϑ=0.5 and ϑ=0.1?

pair-wise vs. multifactor design:

I read the DESeq manual and ran the analysis as described here:
pd <- read.delim2("../phenoData.txt", sep="\t",quote="", row.names=1)

featureCountTable = read.table( "countTable.txt", header=TRUE, row.names=1, quote="")

conditions = factor(pd$comparison) # I have [COLOR="Red"]nine conditions[/COLOR] are ctrl_I, ctrl_NP, ctrl_P, KO1_I, KO1,NP, KO1_P KO2_I, KO2_NP and KO2_P

cds = newCountDataSet( featureCountTable, conditions )

cds = estimateSizeFactors( cds )
normResults <- counts( cds, normalized=TRUE ) 

#Variance estimation
cds = estimateDispersions(cds)

# I than ran for each comparison a binomial test
res_I_ctrl_KO1 = nbinomTest( cds, "ctrl_I", "KO1_I" )
res_P_ctrl_KO1 = nbinomTest( cds, "ctrl_P", "KO1_P" )
I was wondering if DESeq can work this way or if I need to run a multi-factor design such as

fit1 = fitNbinomGLMs( cdsFullDataSet, count ~ libType + condition )
fit0 = fitNbinomGLMs( cdsFullDataSet, count ~ libType )
whereas libType will be the ctrl, KO1 and KO2 and condition will be I, NP and P.

It will be great if I can get some help.

thanks a lot