Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq Variance Stabilizing Transformation

    Hello,

    I am looking for some feedback regarding the use of the variance-stabilization (VST) methods found in the DESeq2 package. Hopefully one of the authors will respond and the comments will be of help to others.

    For me, the purpose for applying this transformation is to be able to generate moderated fold changes for clustering of genes (not samples as in the vignette).

    My data consists of a time series, where for each time point there is a "treated" sample and a "control" sample. Each sample (timepoint) consists of 4 biological replicates.

    I performed the VST on the entire set of data and plot the per-gene standard deviation against the rank of the
    mean*, for the shifted logarithm log2 (n + 1) (left) and the variance stabilizing transformation (right), it does not appear to have a pronounced effect.



    However, if i set up a count dataset that consists of the samples corresponding to one timepoint only (first timepoint in the example below), and perform the VST and plot the standard deviation against rank of the mean, the transformed values have a much better stabilized standard deviation.



    So my questions are: Is there anyway to obtain better variance stabilized data when considering the entire timeseries? Should I just perform the VST on a per timepoint basis; after all I will only be computing fold changes between treatment and control samples at the same timepoint.

    *The procedure was performed as per the DESeq2 manual:

    dds <- estimateSizeFactors(dds)
    dds <- estimateDispersions(dds)
    vsd <- varianceStabilizingTransformation(dds)
    par(mfrow=c(1,2))
    plot(rank(rowMeans(counts(dds))), genefilter::rowVars(log2(counts(dds)+1)), main="log2(x+1) transform")
    plot(rank(rowMeans(assay(vsd))), genefilter::rowVars(assay(vsd)), main="VST")

  • #2
    As far as I know, you have to tell DESEQ to treat all expression values as if they were emerging from a single condition by specifying method="blind" when extimating the Dispersions.

    Comment


    • #3
      I have a slightly unrelated question. It's about the plot.
      Why is the variance low for low mean ? shouldn't it start high and decrease as the mean increase?
      I have a similar data set and even if I filter requiring higher cpm the trend still persists.
      Any one know of why this is the case?

      Comment


      • #4
        DESeq2 variance

        I guess it all depends on the type of data. For my NGS bacterial 16sRNA data, SD increase as the mean increases.
        Attached Files

        Comment


        • #5
          hi John,

          The VST helps to stabilize the variance over the mean, insofar as this can be captured by the parametric curve of dispersion over mean. You might also try the rlog transformation, which sometimes performs qualitatively better than the VST (for example, if the size factors vary a lot across samples).

          Comment


          • #6
            Hi guys,
            Is the VST package of DESeq still functional? Because most of the functions of VST including getVarianceStabilizedData() seem to be dysfunctional in R version 3.0.1. Please help.

            Comment


            • #7
              hi Ayana,

              Can you post the code which you think is not working. Please include full code, R output and sessionInfo()

              The VST and rlog are both implemented in DESeq2, which we suggest you use over DESeq.

              Comment


              • #8
                Originally posted by moritzhess View Post
                As far as I know, you have to tell DESEQ to treat all expression values as if they were emerging from a single condition by specifying method="blind" when extimating the Dispersions.
                Yes. And depending on the data, there may not always be a variance stabilising transformation. In particular, the error model on which the transformation is based assumes that for most genes the variance is dominated by technical noise and natural biological variation between replicates, and that the effects of true differential expression affect only a minority of genes. If that is not the case, then the whole concept does not really work.

                As Mike Love says, the variance stabilsing transformation tends to be misled in cases when the size factors strongly vary between samples, and (at least) in these case the rlog transformation is preferable.
                Last edited by Wolfgang Huber; 04-30-2014, 11:16 AM.
                Wolfgang Huber
                EMBL

                Comment


                • #9
                  @Him26: Note that in John's plots the y-axis is on a log-scale.
                  If you do the same kind of plot with sd computed on the original scale of the counts, then you will indeed expect them to increase with the mean.
                  Wolfgang Huber
                  EMBL

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X