Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dav1dmartin
    Junior Member
    • Aug 2010
    • 2

    DESeq - High Count Variablity across Samples

    Hello,

    I am performing a comparison of gene expression between two groups with ten (biological replicates)samples in each group with DESeq. Unfortunately, the control group has significantly less reads than the experimental group for most of the samples involved, and the sizeFactors range from .54-1.47 across all samples. When performing a variance stabilizing transformation on the normalized data, and grouping the samples in a distance matrix(heatmap), the samples largely group based on total reads instead of the treatment. I am unsure if the normalization method employed by DESeq can handle this wide variation in reads across samples and across groups? Does anyone have suggestions for handling the normalization in this situation or for assessing the effect of treatment overall? Thanks for any suggestions, I'd be happy to provide more details.
    -David
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    Size factors of your range are quite common, and DESeq's main functionality, i.e., testing for differential expression, copes well with it. Hence, just go ahead and run your tests.

    The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.

    Comment

    • dav1dmartin
      Junior Member
      • Aug 2010
      • 2

      #3
      Thanks for the info. Do you know a convenient way to assess global changes in gene expression across samples to group samples in this case? In the vignette for example, the blinded dispersion estimates followed by the vst and distance matrix allowed an unbiased grouping of similar samples(given similar sizeFactors). What if one were to measure the covariance of each sample versus every other, using normalized ratios of individual gene counts to the average gene counts across all samples? Would this allow some sort of grouping between samples with positive vs. negative covariance? Or would you run into the same problem of high variance genes skewing the comparison, if so, could one group the genes according to expression or variance and try this? Thanks again, I am currently trying to generate a list of differentially expressed genes which I am confident are related to the treatment and not high inter-animal variability. I have checked some with qpcr with mixed results so far...
      -David

      Comment

      • Simon Anders
        Senior Member
        • Feb 2010
        • 995

        #4
        I am not quite sure I understand your problem. You want to know which genes changed due to treatment and want to guard against within-group variability. This is the default use case for DESeq, and you will get a statistically sound result if you follow the standard work-flow (which does not use the VST).

        Hence, why again do you want to use the VST? You will need to explain your setup in more detail.

        BTW, checking by qPCR is only very rarely useful. It helps to avoid technical noise (if you think that qPCR is more precise than RNA-Seq) but as you main worry is sample-to-sample variation due to biological causes (i.e., actual expression differences rather than measurement errors), measuring the same samples with another technique will not tell you anything new.

        Comment

        • pbarros
          Junior Member
          • Jul 2012
          • 7

          #5
          DESeq - High Count Variablity across Samples

          Dear Simon,
          I am using DESeq in the analysis of RNAseq data, but I'm still doing experiments with the package, to learn how to use it properly for my particular of data... In this analysis I have two 'control' (replicate) samples and only one 'test' sample (and I will not have replicates for this condition unfortunately). My goal now is just to see whether or not I can use the two control samples as replicates, since the 'controlled' conditions in which the plant material was collected were slightly different.

          Regarding your previous post I'm not sure if I understood well.

          Originally posted by Simon Anders View Post
          The VST needs to resort to a certain approximation (details on request) and hence the heatmap might become misleading if the size factors are different. This does not affect the actual test functions because they do not use the VST.
          So does this mean that if there is some (high) variation between size factors, we may not trust on the results retrieved after VST?
          I am facing "similar" results to what was reported in the DESeq vignette, although in my case the number of replicates is reduced.
          Specifically if I build heatmaps (for count data and sample-to-sample distances) using VST data, my two replicates for 'control' condition cluster together. But when I use untransformed counts one of the 'control' samples clusters with the 'test' sample.

          What intrigues me now is the fact that the size factors are

          test:1.8420157
          control1:0.8258893 (control1 is the one that clusters differently)
          control2:0.6850067

          So my question is this: can I just "trust" on these results and accept my two controls as replicates, or this is a case when "heatmaps might become misleading"...?

          thank you in advance

          Pedro
          Last edited by pbarros; 12-06-2012, 09:26 AM.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          48 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          106 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          125 views
          0 reactions
          Last Post SEQadmin2  
          Working...