Seqanswers Leaderboard Ad

**eslondon** · 08-04-2011, 11:23 PM

Clearly it is important to follow the assumptions and models within each of the tools you mention.

If you want to compile a simple "table of expression", you can produce RKPMs, fold-changes, etc. If, however you use a specific tool, such as edgeR, which has its own methodology for normalizing and estimating differences in expression (bearing in mind that edgeR has a variety of models implemented, as explained in its manual), then you should provide it what it expects, i.e. raw read counts

Since we are still in early days clearly lab validation of results is the key to understanding which tools are giving you best answers in the end....

**sphil** · 08-05-2011, 03:32 AM

Hey,

you are asking somewhat for the 'holy grail' - how to normalize my data.
In my opinion the most crucial step is to know where your data comes from. Thus, DE normalization between technical replicates needs to be different from DE detection between biological replicates (poisson vs. neg. binom (see Marioni et al.)). In addition, as mentioned above, every method assumes a different distribution of reads.
RPKM 'just' normalize for gene length and amount of reads in total. It does not correct biases coming from transcript abundance in the library. Thus your RPKM values should follow a normal distrib. and they should not show a linear correlation between gene length and transcript abundance. However, since housekeepers provide a great amount of transcript one should also take into account to normalize maybe with quantile normalization, for instance. DESeq (and stuff like that) want the raw counts to estimate dispersion and distribution to optimally fit the assumptions to the given data. So I would do different analysis (i.e. using DESeq as well as RPKM/FC analysis) and compare the results. From that comparison you can figure out what distribution fits best to your data, at least somewhat.

**DZhang** · 08-06-2011, 06:07 PM

Hi frymor,

You may try different methods but ultimately you must rely on the follow-up experiment(s) to validate the results. Let's say you try 2-3 analysis methods/models, you will have DE genes identified by all methods or by some. You need to validate them by independent methods - e.g., qPCR. The field needs sufficient validation results to see which method is better suited for a certain application.

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

clarification of rna-seq normalization

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News