No announcement yet.

independet filtering and experimental design in DESeq

  • Filter
  • Time
  • Show
Clear All
new posts

  • independet filtering and experimental design in DESeq

    Hi everybody,

    I know this problem to be discussed quite a lot. I read the posts here and here (and also the papers mentioned in them).

    I have two questions concerning my experiment. One is about the experimental design, the second about how to set the filtering. I think they are both somehow connected, so I would like to place them in one post.

    In my experiment we have three conditions (ctrl, KO1 and KO2) and three separate cell types ( I, P, and NP).
    I would like to understand better how to analyse the data in one go.

    The aim of the experiment is not only to compare the ctrl vs. KO1 and/or KO2, but also to analyse the efficiency of cellular processes by comparing NP vs. P in ctrl and/or KO1 and KO2.

    I ran the analysis once with all genes (without any filtering at all, first!). I compared the ctrl vs. KO1 and KO2. It was interesting to see, that in all the comparisons of ctrl vs. KO1 I get a long list of significantly deregulated genes (FDR=0.1%), but in the comparison ctrl vs. KO2 I get only 2-5 genes.
    So I thought a good explanation for that will be filtering the low-count genes. In search of a good cutoff I tried the genefilter package and got the following rank plot:

    Q1: I was wondering if cutting the data set at 0.57 is a good decision.

    Than I looked for a FDR value and did the rejection plot, to see how many genes I am left with, with each of the different FDR values.
    It was interessting to see, that from 0%-50% they are all overlap each other.

    Q2: Does that mean, that there is no difference between ϑ=0.5 and ϑ=0.1?

    pair-wise vs. multifactor design:

    I read the DESeq manual and ran the analysis as described here:
    pd <- read.delim2("../phenoData.txt", sep="\t",quote="", row.names=1)
    featureCountTable = read.table( "countTable.txt", header=TRUE, row.names=1, quote="")
    conditions = factor(pd$comparison) # I have [COLOR="Red"]nine conditions[/COLOR] are ctrl_I, ctrl_NP, ctrl_P, KO1_I, KO1,NP, KO1_P KO2_I, KO2_NP and KO2_P
    cds = newCountDataSet( featureCountTable, conditions )
    cds = estimateSizeFactors( cds )
    normResults <- counts( cds, normalized=TRUE ) 
    #Variance estimation
    cds = estimateDispersions(cds)
    # I than ran for each comparison a binomial test
    res_I_ctrl_KO1 = nbinomTest( cds, "ctrl_I", "KO1_I" )
    res_P_ctrl_KO1 = nbinomTest( cds, "ctrl_P", "KO1_P" )
    I was wondering if DESeq can work this way or if I need to run a multi-factor design such as

    fit1 = fitNbinomGLMs( cdsFullDataSet, count ~ libType + condition )
    fit0 = fitNbinomGLMs( cdsFullDataSet, count ~ libType )
    whereas libType will be the ctrl, KO1 and KO2 and condition will be I, NP and P.

    It will be great if I can get some help.

    thanks a lot