Hi everybody,
I know this problem to be discussed quite a lot. I read the posts here and here (and also the papers mentioned in them).
I have two questions concerning my experiment. One is about the experimental design, the second about how to set the filtering. I think they are both somehow connected, so I would like to place them in one post.
In my experiment we have three conditions (ctrl, KO1 and KO2) and three separate cell types ( I, P, and NP).
I would like to understand better how to analyse the data in one go.
The aim of the experiment is not only to compare the ctrl vs. KO1 and/or KO2, but also to analyse the efficiency of cellular processes by comparing NP vs. P in ctrl and/or KO1 and KO2.
I ran the analysis once with all genes (without any filtering at all, first!). I compared the ctrl vs. KO1 and KO2. It was interesting to see, that in all the comparisons of ctrl vs. KO1 I get a long list of significantly deregulated genes (FDR=0.1%), but in the comparison ctrl vs. KO2 I get only 2-5 genes.
So I thought a good explanation for that will be filtering the low-count genes. In search of a good cutoff I tried the genefilter package and got the following rank plot:

Q1: I was wondering if cutting the data set at 0.57 is a good decision.
Than I looked for a FDR value and did the rejection plot, to see how many genes I am left with, with each of the different FDR values.

It was interessting to see, that from 0%-50% they are all overlap each other.
Q2: Does that mean, that there is no difference between ϑ=0.5 and ϑ=0.1?
pair-wise vs. multifactor design:
I read the DESeq manual and ran the analysis as described here:
I was wondering if DESeq can work this way or if I need to run a multi-factor design such as
whereas libType will be the ctrl, KO1 and KO2 and condition will be I, NP and P.
It will be great if I can get some help.
thanks a lot
Assa
I know this problem to be discussed quite a lot. I read the posts here and here (and also the papers mentioned in them).
I have two questions concerning my experiment. One is about the experimental design, the second about how to set the filtering. I think they are both somehow connected, so I would like to place them in one post.
In my experiment we have three conditions (ctrl, KO1 and KO2) and three separate cell types ( I, P, and NP).
I would like to understand better how to analyse the data in one go.
The aim of the experiment is not only to compare the ctrl vs. KO1 and/or KO2, but also to analyse the efficiency of cellular processes by comparing NP vs. P in ctrl and/or KO1 and KO2.
I ran the analysis once with all genes (without any filtering at all, first!). I compared the ctrl vs. KO1 and KO2. It was interesting to see, that in all the comparisons of ctrl vs. KO1 I get a long list of significantly deregulated genes (FDR=0.1%), but in the comparison ctrl vs. KO2 I get only 2-5 genes.
So I thought a good explanation for that will be filtering the low-count genes. In search of a good cutoff I tried the genefilter package and got the following rank plot:
Q1: I was wondering if cutting the data set at 0.57 is a good decision.
Than I looked for a FDR value and did the rejection plot, to see how many genes I am left with, with each of the different FDR values.
It was interessting to see, that from 0%-50% they are all overlap each other.
Q2: Does that mean, that there is no difference between ϑ=0.5 and ϑ=0.1?
pair-wise vs. multifactor design:
I read the DESeq manual and ran the analysis as described here:
Code:
pd <- read.delim2("../phenoData.txt", sep="\t",quote="", row.names=1) featureCountTable = read.table( "countTable.txt", header=TRUE, row.names=1, quote="") conditions = factor(pd$comparison) # I have [COLOR="Red"]nine conditions[/COLOR] are ctrl_I, ctrl_NP, ctrl_P, KO1_I, KO1,NP, KO1_P KO2_I, KO2_NP and KO2_P cds = newCountDataSet( featureCountTable, conditions ) cds = estimateSizeFactors( cds ) normResults <- counts( cds, normalized=TRUE ) #Variance estimation cds = estimateDispersions(cds) # I than ran for each comparison a binomial test res_I_ctrl_KO1 = nbinomTest( cds, "ctrl_I", "KO1_I" ) res_P_ctrl_KO1 = nbinomTest( cds, "ctrl_P", "KO1_P" ) ...
Code:
fit1 = fitNbinomGLMs( cdsFullDataSet, count ~ libType + condition ) fit0 = fitNbinomGLMs( cdsFullDataSet, count ~ libType )
It will be great if I can get some help.
thanks a lot
Assa