Seqanswers Leaderboard Ad

**Wolfgang Huber** · 11-23-2012, 01:20 AM

Honey,

these authors make a lot of the fact that the two tools' resulting gene lists do not fully overlap (DESeq, with their settings, produces 11317 and edgeR 11995, with 10.535 in common). But there is no contradiction: like any method, these too make type I errors (false positives) and type II errors (false negatives).

Type I: As far as I could figure out from the paper, they put thresholds at FDR=0.05, which means that up to 5% (ca. 600) of hits are expected to be false positives. This is a fundamental property of statistical testing from noisy measurements, and no amount of gymnastics (incl. method 'overlapping') can get you around that.

Type II: None of the methods provide a good estimate of type II errors. Consider the extreme situation that all genes are differentially expressed if you just look hard enough (not completely unrealistic for a comparison that affects a systemic phenotype, growth), and that the detection is only limited by the sequencing depth and number of replicates. In this case, both results lists are fully correct.

To move forward from this, one needs to further sharpen the question (for instance by considering confidence ranges of the fold change, asking for minimal effect size, or adding a prior on the fraction of true H0), and I suppose that is the subject of active method development in various places.

The results reported in the quoted paper are based on older versions of R (2.13) and the packages. To see how much of them is relevant for current work one would need to re-do the analysis with current versions. (This is not a critique of the paper, but still of note to its users/readers.)

Best wishes
Wolfgang

**rboettcher** · 11-23-2012, 03:51 AM

I would like to add that this comparison has already been performed when DESeq was published and has been discussed on Seqanswers as well, so I am missing the point of the mentioned paper with regards to novelty and overall information content.

It would be much more informative if a general comparison of different tools with different strategies were included; such as cufflinks, eXpress, EBSeq with a known population of DE genes, i.e. performing a real benchmark to assess the performance of each tool.
I am well aware of the fact, that each tool might perform better in certain situations, but due to the sheer number of bioinformatics tools being published / released, it becomes increasingly difficult to keep track and give reasonable suggestions on which to use. Thus, the program offering the best compromise is usually what I aim for.

Nevertheless, it should be quite obvious that different tools will result in different results, as the same can be seen when investigating different data sets from the same platform.
Still, regarding my own experiences with cufflinks, CLC Bio and edgeR results on the same data as well as discussions going on in the forums, I see that cufflinks will not be my first choice for DE analysis in the future, regardless of genes being found exclusively by edgeR.

Best regards

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

comparision of edgR and DEseq

Comment

Comment

Latest Articles

ad_right_rmr

News