Unconfigured Ad

**eslondon** · 08-04-2011, 11:23 PM

Clearly it is important to follow the assumptions and models within each of the tools you mention.

If you want to compile a simple "table of expression", you can produce RKPMs, fold-changes, etc. If, however you use a specific tool, such as edgeR, which has its own methodology for normalizing and estimating differences in expression (bearing in mind that edgeR has a variety of models implemented, as explained in its manual), then you should provide it what it expects, i.e. raw read counts

Since we are still in early days clearly lab validation of results is the key to understanding which tools are giving you best answers in the end....

**sphil** · 08-05-2011, 03:32 AM

Hey,

you are asking somewhat for the 'holy grail' - how to normalize my data.
In my opinion the most crucial step is to know where your data comes from. Thus, DE normalization between technical replicates needs to be different from DE detection between biological replicates (poisson vs. neg. binom (see Marioni et al.)). In addition, as mentioned above, every method assumes a different distribution of reads.
RPKM 'just' normalize for gene length and amount of reads in total. It does not correct biases coming from transcript abundance in the library. Thus your RPKM values should follow a normal distrib. and they should not show a linear correlation between gene length and transcript abundance. However, since housekeepers provide a great amount of transcript one should also take into account to normalize maybe with quantile normalization, for instance. DESeq (and stuff like that) want the raw counts to estimate dispersion and distribution to optimally fit the assumptions to the given data. So I would do different analysis (i.e. using DESeq as well as RPKM/FC analysis) and compare the results. From that comparison you can figure out what distribution fits best to your data, at least somewhat.

**DZhang** · 08-06-2011, 06:07 PM

Hi frymor,

You may try different methods but ultimately you must rely on the follow-up experiment(s) to validate the results. Let's say you try 2-3 analysis methods/models, you will have DE genes identified by all methods or by some. You need to validate them by independent methods - e.g., qPCR. The field needs sufficient validation results to see which method is better suited for a certain application.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 30 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 96 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 116 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 109 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

clarification of rna-seq normalization

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News