It might be better to use normalized raw reads. That way you can see how reliable the fold difference actually is, lower read counts are more prone to noise induced errors. For example if sample A has 5 reads and sample B has 0 reads then it may not be so reliable. Whereas if sample A has 3000 reads and sample B has 0 it would be more reliable. FPKM can change the numbers making low read count small fragments look more reliable than they actually are (not sure if that has changed, it's been a while since I tried FPKM).
Unconfigured Ad
Collapse
X
-
I don't like FPKM/RPKM because it can mask how many reads there actually are, even inflating values when there are actually very few reads. That being said, I disagree with filtering out genes that have a 0 FPKM or read count in one condition. I have seen clear examples where one conditions will have dozens, even hundreds of reads, while the other has zero. Clearly you can observe expression in one condition and call this as differentially expressed. Where things become murky is when one condition has maybe 2-4 reads and the other 0. A better filter then would take into account this possibility.Last edited by chadn737; 01-24-2013, 06:54 AM.
Comment
-
-
The problem with low count data is that it results in a very high false positive error rate for differentially expressed genes. I suppose it then depends on how tolerant your study is to false positives and what your objective is with the data. However, I will make a final comment that in the published RNA-Seq DGE papers of the last 2 or 3 years, there seems to be a clear and growing consensus that low read count data should be excluded from DGE analyses to avoid the bias of much higher false positive errors especially in low expressors. Published papers seem to vary when using FPKM/RPKM normalization, with cutoffs varying from 0.1 to 0.5 (one paper I seem to recall even using higher, but I remember the read depth was quite low in that work as well). However, I too am becoming a non-fan of RPKM or similar methods, as they can be very misleading for some genes.Originally posted by bvb1909 View PostI think that a low read count is in many cases a pretty good indication of low expression. Simply excluding those on the basis that I cannot statistically be sure means you are missing out on a lot of differentially expressed genes. Just as an example, a well known differentially expressed gene under our condition would be ruled out by you because the statistics tells you so (because FPKM = 0 under control condition), biology tells us it is the one of the most important genes.... Guess it is also a matter of what you want to get from the data
In my own work, we have settled on excluding raw counts less than 11 (so I actually filter on count > 10), and then normalize what remains. Even then, It's simply though to plot and show that genes with raw counts between 11 and about 150 or so, have very high variance in their transcript abundance estimates, while for those with counts > about 150, the variance tightens up dramatically. We also always run 5 biological replicates for all treatments and controls.
Working in toxicology and particularly with risk assessment type studies, we do not have the option of dismissing statistical significance, and in fact almost always base our DGE assessments on simultaneously filtering results for statistical significance and minimal fold change difference (although for initial exploratory analyses, we may relax those criteria - as you say, it depends on what one's goals for the data are).
Just as an aside, in the limited qPCR validations series that I've run, we get very poor correspondence with RNA-Seq results base solely on statistical significance or solely on fold change. Correspondence (using ABI TaqMan rtPCR assays) improves dramatically when comparing genes that were both statistically significant and met minimum fold change differences (I usually filter for genes with FDR < 0.05 and FC > +/- 1.5). Nothing novel in that result, and of course, the same applies for microarray data for that matter: combining statistical significance and some minimum magnitude of relative change proves a far more robust estimator of differential gene expression than either cutoff alone. The problem with the original post that started this thread, is that you cannot compute statistical significance in the absence or replicates, so you are left with just raw differences in magnitude based on single measures of abundance.Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism
by SEQadmin2
Started by SEQadmin2, 06-09-2026, 11:58 AM
|
0 responses
17 views
0 reactions
|
Last Post
by SEQadmin2
06-09-2026, 11:58 AM
|
||
|
Started by SEQadmin2, 06-05-2026, 10:09 AM
|
0 responses
27 views
0 reactions
|
Last Post
by SEQadmin2
06-05-2026, 10:09 AM
|
||
|
Started by SEQadmin2, 06-04-2026, 08:59 AM
|
0 responses
38 views
0 reactions
|
Last Post
by SEQadmin2
06-04-2026, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
61 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
Comment