Unconfigured Ad

**Simon Anders** · 11-11-2010, 12:27 AM

All of these suffer from the issue that a few strongly and differentially expressed genes can skew them. See the discussion in our paper and especially in Oshlack and Robinson's paper.

Our DESeq package offers (via its function 'estimateSizeFactors') a simple way to get a robust number for the denominator, which is explained, e.g., here.

Simon

**shurjo** · 11-11-2010, 05:32 PM

Originally posted by Simon Anders View Post

All of these suffer from the issue that a few strongly and differentially expressed genes can skew them. See the discussion in our paper and especially in Oshlack and Robinson's paper.

Our DESeq package offers (via its function 'estimateSizeFactors') a simple way to get a robust number for the denominator, which is explained, e.g., here.

Simon

Hi Simon,

Many thanks for your reply. I read both your and the Oshlack papers and agree with all of the points you make therein. However, in the context of my data, the following points would suggest to me that a simpler normalization strategy may be adequate:

The sixteen libraries I referred to all come from the same tissue source (lymphoblastoid cell lines)
This is a clinical study where the cells were not "induced" or "perturbed" with an external agent, so there is no expectation that a large number of genes will be differentially expressed between the two groups of 8.
A priori, the chances of there being an appreciable number of transcripts that are present in one or a few of these libraries but absent in the others is low.

I understand that using TMM will be better in the vast majority of data sets. However, my objective here is simply to answer a question from my collaborating statisticians (who will not be using either edgeR or DESeq, but their own tests) as to what makes the best denominator for normalizing libraries for differences in coverage. Given this scenario, do you have any suggestions?

Once again, thanks for your help and congratulations on your paper.

Shurjo

**carmeyeii** · 09-03-2012, 03:35 PM

Hi Shurjo,

I have been having a tough time thinking this one out as well. I would appreciate any insight you may have gained by solving this problem. I too am torn between using the htseq count total, the unique mapped reads from tophat or all the alignments generated by tophat.

Thanks for your help,

Carmen

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

denominator for normalization

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News