Unconfigured Ad

**GenoMax** · 10-20-2017, 02:02 AM

You could simulate them yourself to have precise control over the "truth".

**SuzuBell** · 10-20-2017, 05:10 AM

Thank you, I am trying to use real (not simulated) RNA-seq data.

**GenoMax** · 10-20-2017, 06:53 AM

You will find plenty of real datasets which will (claim to have) low false positive rates (everyone wants to achieve that) but it may be hard to find a real dataset that has high false positive rate (since no reviewer would accept that).

**SuzuBell** · 10-21-2017, 04:39 AM

Thanks GenoMax.

1) I agree it might be hard to find a high false-positive rate example on its own. However, if that is the case, I am hoping to find an easily-reproducible example of a dataset that, say, has high false-positive rate when analyzed one way, but low false-positive rate when analyzed another way. This might be available in studies promoting a certain methodology. I am very interested in seeing what DEGs looks like (by counts) when they come from established high false positive rate.

2) I do have one dataset that returns a suspiciously large number of DEGs (through edgeR, DESeq, and limmaVoom). However, when I look at the DEGs (view their counts), I do not see much larger variation between treatment groups than between replicates as expected. This makes me *suspect* many of these DEGs are false positive calls. However, I am looking for a dataset which has been compared to some *standard* that shows it indeed has a high false positive rate, and unfortunately, I do not know of a way to do that with my data. Hence, I am trying to find a public dataset.

**Dario1984** · 11-12-2017, 10:00 PM

RNA-seq differential expression methods are known to be affected by outliers. You have used edgeR to analyse the dataset. What dispersion estimation variety did you use? If you have patient replicates, you should use the robust variety of dispersion estimation. The default method is only useful if you are analysing replicates of cell lines (e.g. 3 replicates of PrEC and 3 replicates of LNCaP), which aren't representative of biological tissue and the heterogeneity of it. There's also a robust style of limma analysis you could be using.

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

Example RNA-seq datasets with low and high false-positive rates

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News