Unconfigured Ad

**ffinkernagel** · 07-15-2014, 01:53 AM

A t-test is dependend on the effect size - and that obviously changes if you do log2.
The general rule is to test on the data you measure - in this case, this would be the un-logged reads per million.

Either way: You should not be testing on the FPKM values, in summary because you loose the information about the no of reads actually behind the value -> more reads -> a better estimate.

Consider using a testing method specifically for RNAseq data such as DESeq.

**jwfoley** · 07-15-2014, 04:10 AM

FPKM is just an intuitive transformation of fragment counts and is not suitable to be used in statistics.

Fortunately, the software package that probably gave you the FPKM values, Cufflinks, also includes a program called cuffdiff that will do the test you want to do in a statistically rigorous way based on modeling the actual fragment counts. Use that instead; don't try to do use statistical tests that are unsuited for your data type on data that are unsuited for statistics.

**int11ap1** · 07-17-2014, 11:09 AM

I do not need specific RNA-seq normalization here for what I want. Both sets of genes (actually I have transcripts) come from the same RNA-seq dataset (the same fasta). One dataset is made up of coding transcripts and the second one is made up of putative lncRNAs. I just wanna know which set or group of transcripts is more expressed.

What is your final conclusion¿

**jwfoley** · 07-17-2014, 11:14 AM

My final conclusion is the same as before: you should use a valid hypothesis test on the count data, like cuffdiff, DESeq2, or edgeR, all of which are quite rigorous, commonly used, and well documented. Do not use an invalid hypothesis test on FPKMs. FPKM is a crude normalization and cannot be used in a meaningful statistical test. Asking us again is not going to change the way numbers work.

**int11ap1** · 07-17-2014, 11:17 AM

But those methods that you say (edgeR and DESeq) are for normalization between different samples or RNA-seq datasets...

**jwfoley** · 07-17-2014, 11:18 AM

No, you have it backwards: those methods are all for statistical hypothesis testing, and FPKM is a (crude, statistically inappropriate) normalization for comparing different samples.

**int11ap1** · 07-17-2014, 11:28 AM

I do not follow you, sorry for asking again.

For example, I have 1000 FPKM values (from 1 RNA-seq sample) from 1000 transcripts. If I want to compare first 500 with second 500 transcripts (for seeing which set is more expressed), I need to use edgeR or DESseq¿ For what¿

**jwfoley** · 07-17-2014, 11:32 AM

Ah, I see: you're comparing some genes with other genes in the same experiment, not same gene different experiment.

You can use FPKM values for this if you use a distribution-free test like Mann-Whitney-Wilcoxon, but that won't be very powerful. Otherwise you could use a more effective normalization like the variance-stabilizing transformation or regularized log in DESeq2 and then use a regular t-test.

**int11ap1** · 07-17-2014, 11:36 AM

Here you are, thanks¡
Why do not apply directly the t-test¿ Where can I learn about it¿

**jwfoley** · 07-17-2014, 11:39 AM

The t-test assumes the populations are normally distributed. FPKMs are not. http://en.wikipedia.org/wiki/Student's_t-test

A log transformation may seem to help but it is still inappropriate because it fails to account for the heteroskedastic mean-variance dependency of read counts. DOI: 10.1111/j.2041-210X.2010.00021.x

**int11ap1** · 07-17-2014, 11:52 AM

But the arithmetic mean of my FPKM values will be normally distributed according to the central limit theorem. In large samples such as mine, t.test for skewed distributions should be fine: http://stats.stackexchange.com/quest...ormal-when-n50

**jwfoley** · 07-17-2014, 11:57 AM

Okay, you could do a normality test to verify that the t-test assumptions are met, but it would be more straightforward and rigorous to just use a better normalization.

Topics	Statistics	Last Post
New Genomic Method Uncovers Ancient Hominin DNA by SEQadmin2 Started by SEQadmin2, Today, 02:55 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 02:55 AM
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM

Unconfigured Ad

t-test FPKM values

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News