Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 Independent Filtering set very high

    Hello,

    When running DESeq2, I have some data and some genes with VERY low-pvalues (e.g. 3.12077x10^-8) were being filtered out with independent filtering, I looked more into it and found something that seems unusual. I looked at the filtering cutoffs and they mostly low with one exception:

    (1) 5.837401
    ->>> (2) 165.6933
    (3) 5.837401
    5.837401
    5.837401
    15.20571
    41.06299
    5.837401
    2.722993
    41.06299

    There is one cutoff 165.69 which seems extremely high. How can I figure out why that is???

    I have explored the results based on the documentation, and I see the following for two datasets ( (1) and (2) ):

    (1) T1 Versus Control

    Less than filtering threshold
    FALSE TRUE
    26177 17452

    RejectionsVsTheta: http://www.uvm.edu/~rbarrant/deseq2Q...trolFilter.jpg
    Pvalue Histogram: http://www.uvm.edu/~rbarrant/deseq2Q...Histograms.jpg
    Pvalue vs Baseman: http://www.uvm.edu/~rbarrant/deseq2Q...lVsBaseman.jpg


    T1VsT4Histograms.jpg
    rarrow.jpg* T1VsControlpvalVsBaseman.jpg* T1vsT4pvalVsBasemean.jpg*
    T1VsControlFilter.jpg T1VsT4Filter.jpg TrivsTri_T4histograms.jpg



    Less than .1

    (2) T1 vs T4

    Less than filtering threshold
    FALSE TRUE
    34903 8726

    RejectionsVsTheta: http://www.uvm.edu/~rbarrant/deseq2Question/T1VsControlFilter.jpg
    Pvalue Histogram: http://www.uvm.edu/~rbarrant/deseq2Q...Histograms.jpg
    Pvalue vs Baseman: http://www.uvm.edu/~rbarrant/deseq2Q...lVsBaseman.jpg

    I don't get why the threshold is set so high for T1VsT4. What am I not seeing??

    Thanks,
    Ramiro

  • #2
    hi,

    The threshold depends on the number of non-null DE genes which can be recovered by increasing the threshold. The IF procedure comes from the genefilter package with this companion paper:



    However, when there are few non-null genes, we noticed that the filter can go higher than we want, based on jitter in the plot of #rejections ~ quantile of filter. In the upcoming release (v1.10 in October), I have changed the procedure slightly, so that the filter will not jump so high in these cases, but take the smallest quantile of filter such that the #rejection is within a window of the maximal value.

    On the other hand, if you want consistent filtering across different comparisons, you could turn independentFiltering=FALSE and just performing your own filtering beforehand:

    dds <- estimateSizeFactors(dds)
    mnc <- rowMeans(counts(dds, normalized=TRUE))
    dds <- dds[mnc > 5,]

    Comment


    • #3
      Thank you very much. This helps. I do have a question about how the threshold is set, I was reading the documentation and it says: "the results function maximizes the number of rejections (adjusted pvalue less than a significance level), over theta, the quantiles of a filtering statistic (in this case, the mean of normalized counts)"

      Does this mean that the filtering threshold is just the maximum of

      #rejections / quantile

      And the variation caused by having fewer non-nulls causes this to be set higher sometimes??

      Thanks very much for your help.

      Ramiro

      Comment


      • #4
        no, by maximize y over x, I mean that we find the value of x that gives the largest value of y.

        See the figure in the Independent Filtering section of the vignette.

        Or we also discuss this in the paper:
        In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .


        Or you can also read about independent filtering in the original paper on the method which we use (DESeq2 calls the genefilter package for independent filtering as discussed in ?results):

        Comment


        • #5
          How to normalize RNAseq raw read counts with out control?

          Hi,

          I have few RNAseq data (raw read counts, single end) from mouse adenocarcinoma tumor model (no control, all tumor model).
          Im trying to show that these mouse adenocarcinoma tumor models show more correlation with human adenocarcinoma model when compared to other human tumor model.
          Different human tumor model (raw read counts, paired end) were downloaded from TCGA dataset.

          Could any one suggest a way by which i do the normalisation. What about median normalisation on mouse and human tumors ?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X