I need to normalize some RNA-seq data (read counts) to obtain library-size normalized inputs for eQTL calling. The easiest option would be to simply calculate TPMs or FPKMs (i.e. divide by the total number of reads per library) but I thought it might actually be better to use DESeq2's variance stabilizing transformation for this. If I am reading the vignette correctly, this should not only scale the read counts by a better estimate of the library size, but also log them and result in a more stable variance across mean expression levels.
Has anybody tried this before? Is there any reason why this would not be a good idea?
An additional complication is that I have read counts at multiple conditions. Right now I am normalizing them all at the same time, using design = ~condition to tell DESeq which condition each sample is from. Is this okay or would it be better to normalize each condition individually?
Here is my code:
Has anybody tried this before? Is there any reason why this would not be a good idea?
An additional complication is that I have read counts at multiple conditions. Right now I am normalizing them all at the same time, using design = ~condition to tell DESeq which condition each sample is from. Is this okay or would it be better to normalize each condition individually?
Here is my code:
Code:
library(DESeq2) cds = DESeqDataSetFromMatrix(data.in, colData = colData, design = ~condition) dds = DESeq(cds) vsd = varianceStabilizingTransformation(dds, blind = FALSE) data.out = assay(vsd) colnames(data.out) = colnames(data.in)