Announcement

Collapse
No announcement yet.

DESeq2 Independent Filtering set very high

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 Independent Filtering set very high

    Hello,

    When running DESeq2, I have some data and some genes with VERY low-pvalues (e.g. 3.12077x10^-8) were being filtered out with independent filtering, I looked more into it and found something that seems unusual. I looked at the filtering cutoffs and they mostly low with one exception:

    (1) 5.837401
    ->>> (2) 165.6933
    (3) 5.837401
    5.837401
    5.837401
    15.20571
    41.06299
    5.837401
    2.722993
    41.06299

    There is one cutoff 165.69 which seems extremely high. How can I figure out why that is???

    I have explored the results based on the documentation, and I see the following for two datasets ( (1) and (2) ):

    (1) T1 Versus Control

    Less than filtering threshold
    FALSE TRUE
    26177 17452

    RejectionsVsTheta: http://www.uvm.edu/~rbarrant/deseq2Q...trolFilter.jpg
    Pvalue Histogram: http://www.uvm.edu/~rbarrant/deseq2Q...Histograms.jpg
    Pvalue vs Baseman: http://www.uvm.edu/~rbarrant/deseq2Q...lVsBaseman.jpg


    T1VsT4Histograms.jpg
    rarrow.jpg* T1VsControlpvalVsBaseman.jpg* T1vsT4pvalVsBasemean.jpg*
    T1VsControlFilter.jpg T1VsT4Filter.jpg TrivsTri_T4histograms.jpg



    Less than .1

    (2) T1 vs T4

    Less than filtering threshold
    FALSE TRUE
    34903 8726

    RejectionsVsTheta: http://www.uvm.edu/~rbarrant/deseq2Question/T1VsControlFilter.jpg
    Pvalue Histogram: http://www.uvm.edu/~rbarrant/deseq2Q...Histograms.jpg
    Pvalue vs Baseman: http://www.uvm.edu/~rbarrant/deseq2Q...lVsBaseman.jpg

    I don't get why the threshold is set so high for T1VsT4. What am I not seeing??

    Thanks,
    Ramiro

  • #2
    hi,

    The threshold depends on the number of non-null DE genes which can be recovered by increasing the threshold. The IF procedure comes from the genefilter package with this companion paper:

    http://www.pnas.org/content/107/21/9546.long

    However, when there are few non-null genes, we noticed that the filter can go higher than we want, based on jitter in the plot of #rejections ~ quantile of filter. In the upcoming release (v1.10 in October), I have changed the procedure slightly, so that the filter will not jump so high in these cases, but take the smallest quantile of filter such that the #rejection is within a window of the maximal value.

    On the other hand, if you want consistent filtering across different comparisons, you could turn independentFiltering=FALSE and just performing your own filtering beforehand:

    dds <- estimateSizeFactors(dds)
    mnc <- rowMeans(counts(dds, normalized=TRUE))
    dds <- dds[mnc > 5,]

    Comment


    • #3
      Thank you very much. This helps. I do have a question about how the threshold is set, I was reading the documentation and it says: "the results function maximizes the number of rejections (adjusted pvalue less than a significance level), over theta, the quantiles of a filtering statistic (in this case, the mean of normalized counts)"

      Does this mean that the filtering threshold is just the maximum of

      #rejections / quantile

      And the variation caused by having fewer non-nulls causes this to be set higher sometimes??

      Thanks very much for your help.

      Ramiro

      Comment


      • #4
        no, by maximize y over x, I mean that we find the value of x that gives the largest value of y.

        See the figure in the Independent Filtering section of the vignette.

        Or we also discuss this in the paper:
        http://www.genomebiology.com/2014/15/12/550

        Or you can also read about independent filtering in the original paper on the method which we use (DESeq2 calls the genefilter package for independent filtering as discussed in ?results):
        http://www.pnas.org/content/107/21/9546.long

        Comment


        • #5
          How to normalize RNAseq raw read counts with out control?

          Hi,

          I have few RNAseq data (raw read counts, single end) from mouse adenocarcinoma tumor model (no control, all tumor model).
          Im trying to show that these mouse adenocarcinoma tumor models show more correlation with human adenocarcinoma model when compared to other human tumor model.
          Different human tumor model (raw read counts, paired end) were downloaded from TCGA dataset.

          Could any one suggest a way by which i do the normalisation. What about median normalisation on mouse and human tumors ?

          Comment

          Working...
          X