Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq - High Count Variablity across Samples


    I am performing a comparison of gene expression between two groups with ten (biological replicates)samples in each group with DESeq. Unfortunately, the control group has significantly less reads than the experimental group for most of the samples involved, and the sizeFactors range from .54-1.47 across all samples. When performing a variance stabilizing transformation on the normalized data, and grouping the samples in a distance matrix(heatmap), the samples largely group based on total reads instead of the treatment. I am unsure if the normalization method employed by DESeq can handle this wide variation in reads across samples and across groups? Does anyone have suggestions for handling the normalization in this situation or for assessing the effect of treatment overall? Thanks for any suggestions, I'd be happy to provide more details.

  • #2
    Size factors of your range are quite common, and DESeq's main functionality, i.e., testing for differential expression, copes well with it. Hence, just go ahead and run your tests.

    The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.


    • #3
      Thanks for the info. Do you know a convenient way to assess global changes in gene expression across samples to group samples in this case? In the vignette for example, the blinded dispersion estimates followed by the vst and distance matrix allowed an unbiased grouping of similar samples(given similar sizeFactors). What if one were to measure the covariance of each sample versus every other, using normalized ratios of individual gene counts to the average gene counts across all samples? Would this allow some sort of grouping between samples with positive vs. negative covariance? Or would you run into the same problem of high variance genes skewing the comparison, if so, could one group the genes according to expression or variance and try this? Thanks again, I am currently trying to generate a list of differentially expressed genes which I am confident are related to the treatment and not high inter-animal variability. I have checked some with qpcr with mixed results so far...


      • #4
        I am not quite sure I understand your problem. You want to know which genes changed due to treatment and want to guard against within-group variability. This is the default use case for DESeq, and you will get a statistically sound result if you follow the standard work-flow (which does not use the VST).

        Hence, why again do you want to use the VST? You will need to explain your setup in more detail.

        BTW, checking by qPCR is only very rarely useful. It helps to avoid technical noise (if you think that qPCR is more precise than RNA-Seq) but as you main worry is sample-to-sample variation due to biological causes (i.e., actual expression differences rather than measurement errors), measuring the same samples with another technique will not tell you anything new.


        • #5
          DESeq - High Count Variablity across Samples

          Dear Simon,
          I am using DESeq in the analysis of RNAseq data, but I'm still doing experiments with the package, to learn how to use it properly for my particular of data... In this analysis I have two 'control' (replicate) samples and only one 'test' sample (and I will not have replicates for this condition unfortunately). My goal now is just to see whether or not I can use the two control samples as replicates, since the 'controlled' conditions in which the plant material was collected were slightly different.

          Regarding your previous post I'm not sure if I understood well.

          Originally posted by Simon Anders View Post
          The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.
          So does this mean that if there is some (high) variation between size factors, we may not trust on the results retrieved after VST?
          I am facing "similar" results to what was reported in the DESeq vignette, although in my case the number of replicates is reduced.
          Specifically if I build heatmaps (for count data and sample-to-sample distances) using VST data, my two replicates for 'control' condition cluster together. But when I use untransformed counts one of the 'control' samples clusters with the 'test' sample.

          What intrigues me now is the fact that the size factors are

          control1:0.8258893 (control1 is the one that clusters differently)

          So my question is this: can I just "trust" on these results and accept my two controls as replicates, or this is a case when "heatmaps might become misleading"...?

          thank you in advance

          Last edited by pbarros; 12-06-2012, 09:26 AM.


          Latest Articles


          • seqadmin
            Quality Control Essentials for Next-Generation Sequencing Workflows
            by seqadmin

            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

            Nucleic Acid Quality Control
            Preparing for NGS starts with isolating the...
            02-10-2025, 01:58 PM
          • seqadmin
            An Introduction to the Technologies Transforming Precision Medicine
            by seqadmin

            In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
            01-27-2025, 07:46 AM





          Topics Statistics Last Post
          Started by seqadmin, 02-07-2025, 09:30 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 02-05-2025, 10:34 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 02-03-2025, 09:07 AM
          0 responses
          Last Post seqadmin  
          Started by seqadmin, 01-31-2025, 08:31 AM
          0 responses
          Last Post seqadmin  