Seqanswers Leaderboard Ad

**Simon Anders** · 01-02-2012, 11:33 PM

I start by answering two of your questions:

4. Do both edgeR and DESeq offer different built-in methods of data normalization applicable for time-course data (NOT pair-wise comparisons)?

Normalization is independent of the experimental design. The built-in normalisations of DESeq and edgeR simply determine for each sample a scaling factor (or: size factor), such that all samples' counts, when multiplied with their factor, are on a scale that allows for comparisons. What you want compare with what is unimportant for this step.

5. Will normalization have to be performed with respect to a reference data point, lets say time point zero (which makes intuitive and biological sense to me)
OR
are there variants of normalization that can normalize data across time, but without explicitly choosing a reference (such a method, if it exists, does not make intuitive or biological sense to me)

DESeq chooses the size factors such that their product is one, in order to put the common scale somewhere in the middle of all the library sizes. If you multiplied all the factors by a constant, the analysis result would not change. Hence, one could as well declare an arbitrary sample as reference and chose the factors such that this sample gets assigned a one.

3. Strictly speaking, should the choice of normalization method be justified through some measure or test, or is it norm to try out different methods?

If the normalization does not work well, replicates will appear less similar than they are. This drives up the variance estimate and reduces the number of hits. Hence, in theory, a bad normalization should only reduce power, i.e., is conservative. I'm not sure, though, whether it would be a good idea to use the number of hits in the downstream test for differential expression as a figure of merits for the quality of the normalization; one might easily fall for outliers that way.

**steven** · 01-04-2012, 08:31 AM

Looking for co-expressed genes throughout time points? I haven't seen much of this in NGS papers yet. What about a clustering approach? Maybe this thread could help.

**anandksrao** · 01-05-2012, 12:58 PM

Originally posted by Simon Anders View Post

DESeq chooses the size factors such that their product is one, in order to put the common scale somewhere in the middle of all the library sizes. If you multiplied all the factors by a constant, the analysis result would not change. Hence, one could as well declare an arbitrary sample as reference and chose the factors such that this sample gets assigned a one.

I have some questions regarding the calculation of the geometric mean to normalize individual libraries as implemented by estimateSizeFactors in DESeq.
I checked out the DESeq package documentation for estimateSizeFactorsForMatrix

Description:
Given a matrix or data frame of count data, this function
estimates the size factors as follows: Each column is divided by
the geometric means of the rows. The median (or, ir requested,
another location estimator) of these ratios (skipping the genes
with a geometric mean of zero) is used as the size factor for this
column.

My question to the forum / Simon is very specifically about "skipping the genes with a geometric mean of zero"

Skipping genes with a geometric mean of zero seems to me like it might miss quite a few genes, especially in my time course study, where across so many time points there is probably a higher chance, than for just a pairwise comparison with 2 time points, that even a highly expressed gene at time t1 may have zero expression at time t2. Such a gene would have 0 geometric mean, and would be consequently discarded. I would not want to discard such a gene from my analysis - quite the contrary actually.

So for the purpose of not missing genes I am trying 2 things:
a. pseudo-replace : substitute any raw count 0 to raw count 1, then perform the analysis,
OR
b. pseudo-add: add 1 to all raw counts, then perform the analysis

Do my option a. or option b. violate the nBinom model or suffer from any intrinsic error that precludes correct conclusions ?

I intend to use my slightly modified data from options a and b, to
1, normalize using RLE (nomenclature from edgeR),
2. perform VST if library variances are heteroskedastic, and
3. finally perform fuzzy-K clustering to obtain dominant temporal patterns of expression.

Looking forward to your opinions / comments / criticisms

**Simon Anders** · 01-05-2012, 01:25 PM

Don't worry. The genes with zero counts are just not used in the calculation of the size factors. They are, of course, not discarded and not excluded from the test for differential expression.

**anandksrao** · 01-05-2012, 04:20 PM

Originally posted by Simon Anders View Post

Don't worry. The genes with zero counts are just not used in the calculation of the size factors. They are, of course, not discarded and not excluded from the test for differential expression.

Thanks Simon!

**anandksrao** · 10-20-2012, 10:50 AM

For my time series - based clustering problem to find co-expressed genes with identical temporal expression profiles (which is NOT the same as DE gene identification), I would assume there is still the problem of over-dispersion across multiple biological replicates we have. So will DESeq help perform the variance stabilization transformation, after which I can use this transformed data for time series clustering?

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

choosing & validating RNA-Seq time course data normalization method(s)

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News