Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 Independent Filtering set very high

    Hello,

    When running DESeq2, I have some data and some genes with VERY low-pvalues (e.g. 3.12077x10^-8) were being filtered out with independent filtering, I looked more into it and found something that seems unusual. I looked at the filtering cutoffs and they mostly low with one exception:

    (1) 5.837401
    ->>> (2) 165.6933
    (3) 5.837401
    5.837401
    5.837401
    15.20571
    41.06299
    5.837401
    2.722993
    41.06299

    There is one cutoff 165.69 which seems extremely high. How can I figure out why that is???

    I have explored the results based on the documentation, and I see the following for two datasets ( (1) and (2) ):

    (1) T1 Versus Control

    Less than filtering threshold
    FALSE TRUE
    26177 17452

    RejectionsVsTheta: http://www.uvm.edu/~rbarrant/deseq2Q...trolFilter.jpg
    Pvalue Histogram: http://www.uvm.edu/~rbarrant/deseq2Q...Histograms.jpg
    Pvalue vs Baseman: http://www.uvm.edu/~rbarrant/deseq2Q...lVsBaseman.jpg


    T1VsT4Histograms.jpg
    rarrow.jpg* T1VsControlpvalVsBaseman.jpg* T1vsT4pvalVsBasemean.jpg*
    T1VsControlFilter.jpg T1VsT4Filter.jpg TrivsTri_T4histograms.jpg



    Less than .1

    (2) T1 vs T4

    Less than filtering threshold
    FALSE TRUE
    34903 8726

    RejectionsVsTheta: http://www.uvm.edu/~rbarrant/deseq2Question/T1VsControlFilter.jpg
    Pvalue Histogram: http://www.uvm.edu/~rbarrant/deseq2Q...Histograms.jpg
    Pvalue vs Baseman: http://www.uvm.edu/~rbarrant/deseq2Q...lVsBaseman.jpg

    I don't get why the threshold is set so high for T1VsT4. What am I not seeing??

    Thanks,
    Ramiro

  • #2
    hi,

    The threshold depends on the number of non-null DE genes which can be recovered by increasing the threshold. The IF procedure comes from the genefilter package with this companion paper:



    However, when there are few non-null genes, we noticed that the filter can go higher than we want, based on jitter in the plot of #rejections ~ quantile of filter. In the upcoming release (v1.10 in October), I have changed the procedure slightly, so that the filter will not jump so high in these cases, but take the smallest quantile of filter such that the #rejection is within a window of the maximal value.

    On the other hand, if you want consistent filtering across different comparisons, you could turn independentFiltering=FALSE and just performing your own filtering beforehand:

    dds <- estimateSizeFactors(dds)
    mnc <- rowMeans(counts(dds, normalized=TRUE))
    dds <- dds[mnc > 5,]

    Comment


    • #3
      Thank you very much. This helps. I do have a question about how the threshold is set, I was reading the documentation and it says: "the results function maximizes the number of rejections (adjusted pvalue less than a significance level), over theta, the quantiles of a filtering statistic (in this case, the mean of normalized counts)"

      Does this mean that the filtering threshold is just the maximum of

      #rejections / quantile

      And the variation caused by having fewer non-nulls causes this to be set higher sometimes??

      Thanks very much for your help.

      Ramiro

      Comment


      • #4
        no, by maximize y over x, I mean that we find the value of x that gives the largest value of y.

        See the figure in the Independent Filtering section of the vignette.

        Or we also discuss this in the paper:
        In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .


        Or you can also read about independent filtering in the original paper on the method which we use (DESeq2 calls the genefilter package for independent filtering as discussed in ?results):

        Comment


        • #5
          How to normalize RNAseq raw read counts with out control?

          Hi,

          I have few RNAseq data (raw read counts, single end) from mouse adenocarcinoma tumor model (no control, all tumor model).
          Im trying to show that these mouse adenocarcinoma tumor models show more correlation with human adenocarcinoma model when compared to other human tumor model.
          Different human tumor model (raw read counts, paired end) were downloaded from TCGA dataset.

          Could any one suggest a way by which i do the normalisation. What about median normalisation on mouse and human tumors ?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Non-Coding RNA Research and Technologies
            by seqadmin




            Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

            Nobel Prize for MicroRNA Discovery
            This week,...
            Yesterday, 08:07 AM
          • seqadmin
            Recent Developments in Metagenomics
            by seqadmin





            Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
            09-23-2024, 06:35 AM
          • seqadmin
            Understanding Genetic Influence on Infectious Disease
            by seqadmin




            During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

            Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
            09-09-2024, 10:59 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 10-02-2024, 04:51 AM
          0 responses
          72 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-01-2024, 07:10 AM
          0 responses
          84 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-30-2024, 08:33 AM
          1 response
          86 views
          0 likes
          Last Post EmiTom
          by EmiTom
           
          Started by seqadmin, 09-26-2024, 12:57 PM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Working...
          X