Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq 1.5.30 - estimateDispersions

    Dear all,

    I have a dataset consisting of a matrix of gene counts for two different conditions with 3 biological replicates each and I want to call diff-expressed genes in response to the treatment.
    However, I am wondering which parameters are most appropriate for me to use in the estimateDispersions function:
    1. method: per-condition or pooled
    2. fitType: parametric, local

    1. I don't exactly understand which difference both functions have computationally; in general per-condition seems more logical to me. I have calculated the amount of diffex genes with both methods and contrary to my expectation the per-condition parameter resulted in more sig. diffex genes.
    Can anyone give me some advise on what to choose and WHY it might be more realistic?

    2. Here I also don't know which setting makes more sense and why. I'd be thankful for suggestions!

    Thanks a lot!

  • #2
    Sometimes, both condition have unequal variance (for example, knock-down samples might differ strongly from each other than untreated control samples because knock-down efficiency is so hard to keep constant), and then, "per-condition" can give more power. This is why this was the default. However, I realized recently that our way of avoiding outliers (see the discussion of 'sharingMode="maximum"' in the vignette) does not work as reliably as I hoped when using "per-condition" estimation. This is why I changed the default to "pooled" and added a note about this fact to the help page. I have some ideas on how to improve this matter but pending that i recommend "pooled".

    For fitType, both ways should give good results, and so far, this does not seem to make much of a difference. If you plot the dispersions against the means, as shown in the vignette, you can see which of the two fit types gives a fit that seem to follow the data more closely.

    Comment


    • #3
      Hi Simon,
      thanks a lot for your answer.

      However, I still don't understand how one single pooled empirical dispersion value "pooled" versus an empirical dispersion value for each condition with biol. replicates "per-condition" is applied for the subsequent calculation, which could help me understand in which case I'd expect more/fewer diffex genes.
      In my case using two different examples (each with 3 biol. repl. per condition) the pooled option reduced the amount of diffex genes. Is this what you would have expected?

      I'm sorry if this is already answered in the threat you mentioned (see the discussion of 'sharingMode="maximum"' in the vignette), which I unfortunately couldn't find (would be great if you could post the link).

      Thanks lots!

      Comment


      • #4
        Additionally, I have added the "funnel" plots of the results of the diffex assessment with the respective # of identified genes of two treatments each with 3 biol. repl. using two different parameter setting for the estimateDispersions function:

        cds.1 <- estimateDispersions( cds.1, sharingMode="maximum", method="per-condition", fitType="local" ); s="max"; m="per-cond"; f="local"
        --> # diffex genes: 415


        cds.1 <- estimateDispersions( cds.1, sharingMode="maximum", method="pool", fitType="local" ); s="max"; m="pool"; f="local"
        --> # diffex genes: 214

        Is it to be expected that a lot of genes with "high" log2FCs and "high" mean expression are not identified as significant?!
        Do these plots look "normal" to you?!

        Thanks a lot!
        Attached Files

        Comment


        • #5
          I'm sorry if this is already answered in the threat you mentioned (see the discussion of 'sharingMode="maximum"' in the vignette), which I unfortunately couldn't find (would be great if you could post the link).
          I mean the vignette, not a thread. See pages 4 to 6 here.

          Originally posted by horizon View Post
          Is it to be expected that a lot of genes with "high" log2FCs and "high" mean expression are not identified as significant?!
          Do these plots look "normal" to you?!
          Use the 'identify' function of R to get the gene IDs of some of those black point with high mean and high log FC and then look at the individual normalized counts. I expect that you will find that they vary a lot form replicate to replicate and this why DESeq (at least the new version) does not call them as differentially expressed.

          Comment


          • #6
            Simon,

            I have a similar question. I have miRNA data and am looking for differentially expressed and get a lot more D.E miRNAs from the previous version of DESeq as compared to the newer version.

            I would like to understand whether, the newer version could be more conservative for lower # of reads data as compared to the older version?

            FYI, the sizefactors for our datasets are:

            u_1 u_2 s_1 s_2
            1.4265463 1.0675662 0.6081645 1.1458061


            where u and s are the conditions and 1 and 2 are the replicates.

            Thanks,
            Praful

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Quality Control Essentials for Next-Generation Sequencing Workflows
              by seqadmin




              Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

              Nucleic Acid Quality Control
              Preparing for NGS starts with isolating the...
              02-10-2025, 01:58 PM
            • seqadmin
              An Introduction to the Technologies Transforming Precision Medicine
              by seqadmin


              In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
              01-27-2025, 07:46 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 02-07-2025, 09:30 AM
            0 responses
            65 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-05-2025, 10:34 AM
            0 responses
            101 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-03-2025, 09:07 AM
            0 responses
            81 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 01-31-2025, 08:31 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Working...
            X