Seqanswers Leaderboard Ad

**honey** · 05-15-2012, 06:19 AM

Large FPKM with Cufflink

This issue have been discussed several times on this forum and I also have brought this issue on this forum that cufflink will give very high FPKM for some of genes irrepsective of their size (cole suggested for short). I will be intrested to know:
1. range of FPKM values with and without -N option.
2. What excatly you think may have happened with -N option to give high FPKM?
I have observed no solution to this high FPKM problem but you have found it will be intresting to learn more about this.
Thanks

**sdriscoll** · 05-15-2012, 03:02 PM

With cufflinks you can have three different normalizations: fragments mapped to genome (in millions), fragments mapped to transcriptome (in millions: --compatable-hits-norm) or upper quartile (-N). Regardless of the normalization the same number of reads is quantified at each gene. I've looked into it myself. If you run cufflinks using all three of those normalizations then look at each of the separate isoforms.fpkm_tracking files you can confirm it. Check for the coverage and FPKM columns. You should see different FPKMs but identical coverages across the three quantifications. Furthermore if you divide the FPKMs by each other you should see that at each gene there's a constant ratio between the FPKMs.

If you calc FPKMs yourself you can see why the numbers shift around. To be honest the "FPKM" designation is misleading when you're using any normalization other than "mapped reads in millions". Right? Fragments per kilobase per million mapped reads is what you're used to.

So say we have a gene that's 2500 bases long. We've got 121 fragments that mapped to it and we've got 34.7 million fragments mapped to the genome. We can get the FPKM like so..

Code:

FPKM = 121/(34.7*2.5) = 1.394813

Say only 27.4 million fragments mapped to the transcriptome. So if you used --compatible-hits-norm then the calculation looks like this:

Code:

FPKM = 121/(27.4*2.5) = 1.766423

Those aren't that different from one another. Now if you use upper quartile we're talking about the upper quartile value of fragments mapped to genes in the sample. That number might be something like 12,000. Divide this value by 1e6 to put it into "millions" like you do with mapped fragments it becomes 0.012. So now the calculation looks like this:

Code:

FPKM = 121/(0.012*2.5) = 4033.333

So maybe it makes sense to scale the upper quartile normalization value by 1000 so that the "FPKM" comes out as 4.033 instead of 4033. That's reasonable. But it really shouldn't be called an FPKM because if you think about it it's like someone telling you there are 14 cars outside and you assume they mean 14...but they actually told you 14 in base 16 which would be 20 in base 10 (or maybe like expecting a measurement to be in cm but you're given the measurement in inches with a cm designation). It's not fragments per kilobase per million mapped reads, it's fragments per kilobase per upper quartile of read counts @ genes. So FPKPUQRCG. That name sucks.

The point of these different normalizations is only applicable to when you're comparing samples to each other. So if you're goal is to see if gene X is expressed higher in Sample A verses B then regardless of the normalization used (as long as you use the same one on both samples) you'll find your answer. The upper quartile normalization has been showing to be more robust so maybe it's better to use it for comparing samples to one another. Also, obviously, for the expression levels to make sense to other people we all need to be using the same normalization. We should probably all be using upper quartile normalization but that puts the numbers on a different scale than we used to seeing.

Hope that helped.

**MW007** · 09-20-2012, 08:59 AM

Thats was crazy helpful-thanks.

-MW

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Cufflinks large FPKM with -g and -N options

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News