Seqanswers Leaderboard Ad

**magick** · 09-29-2011, 02:11 AM

It's good that I'm not alone.

Code:

79      447370
86      148939
100     50999.3
101     142356
103     101460
103     216072

Same result observed for my cufflinks reports. As transcripts length longer, FPKM value also decrease to more reasonable level. Hope for any kind helps!

**cram** · 09-29-2011, 06:40 AM

I've been reading through the supplemental methods of the Cufflinks paper and I have a theory about why this is happening. Rather than use the actual transcript length in FPKM calculations, Cufflinks uses what they call an adjusted length. This is intended to account for the fact that the expected fragment length will affect the probability of selecting a fragment from a transcript of a given length.

If I'm following the math correctly then this formula does not really handle cases where the transcript length is significantly shorter than the expected fragment length. It will produce an extremely low value for the adjusted transcript length, which will then cause the high FPKMs.

I've sent an email to the cufflinks developers to ask them if this sounds reasonable. In the meantime I think I'll just exclude transcripts shorter than 200bp or at least ignore the FPKM values for intra-sample expression comparisons.

**magick** · 09-29-2011, 09:23 PM

Thanks for your information and proactive action. I did gone through that but never thought of it as a problem. Hope you can get a good answer from the developers.

I am now using RSEM for the calculation of readcount then input for DESeq for differential expression analysis. This way perform well and looks better for me. I'm afraid that removing 200 bp transcripts might removing some useful information for the analysis.

**Hunny** · 09-30-2011, 12:37 AM

Originally posted by cram View Post

I've been reading through the supplemental methods of the Cufflinks paper and I have a theory about why this is happening. Rather than use the actual transcript length in FPKM calculations, Cufflinks uses what they call an adjusted length. This is intended to account for the fact that the expected fragment length will affect the probability of selecting a fragment from a transcript of a given length.

If I'm following the math correctly then this formula does not really handle cases where the transcript length is significantly shorter than the expected fragment length. It will produce an extremely low value for the adjusted transcript length, which will then cause the high FPKMs.

I've sent an email to the cufflinks developers to ask them if this sounds reasonable. In the meantime I think I'll just exclude transcripts shorter than 200bp or at least ignore the FPKM values for intra-sample expression comparisons.

Hi,

I have been reading the supplemental material of Cufflinks.
And I have been hurt by the lots of formulas on it.

Could you tell me the reason why they use the adjusted length? What does the length mean in math or biology?

Thanks,

**Neuromancer** · 05-29-2013, 08:38 AM

Hey cram,

did you find a solution to this in the end? Is it different in newer versions of Cufflinks?

I'm actually struggling with a connected problem and intra-sample comparisons:

Is there a possibility to compare the transcript abundance within certain group of transcripts (e.g. Gene_A, Gene_B, Gene_C) to actually rank them by expression (i.e. Gene_A is higher expressed than Gene_C)?

I tried counting within exons and normalizing to the lengths exon-summed transcripts, but there might still be some bias, since some exons will also overlap...

Any ideas?

**choy** · 05-30-2013, 04:22 AM

Thanks for pointing this phenomenon out.

I am using cufflinks extensively and noticed this behavior somewhere around the Cufflinks version 1.0.0 release. Older versions of Cufflinks did not seem to have this issue.

Currently I circumvent this by removing or ignoring transcripts shorter than 250bp. Plotting the distribution of FPKMs shows this to be a reasonable cutoff value. I agree that the abnormal increase in FPKM may be tied to the fragment length.

I agree that there is a problem here and hope the developers address it.

Best regards

**sdriscoll** · 05-30-2013, 03:44 PM

this is a common issue - it's in eXpress as well. in fact any of these tools that uses the "effective length correction" for read counts or expressions. apparently there isn't currently a logical way to fix it. additionally it's only theoretical that this adjustment improves expressions. if you're counting hits in a more general way, like with htseq-count, this adjustment is not made. I don't like it because it says that there's reads in my data that don't exist! it should be obvious that counts for features that are so close to the expected fragment length may be unreliable or lower than they *should* be - that's good enough information for me.

if you want you can disable this adjustment in cufflinks by using their '--no-effective-length-correction' option. this fixes it. i've, for example, compared read counts reverse calculated from the FPKM's cufflinks reports using this option and they are identical to counts i get through a normal naive counting method (at the gene locus level).

by the way you can get those "raw" counts back from cufflinks by keeping the "Raw Map Mass" value it reports during its run and then using the following calculation on the FPKM values in isoforms.fpkm_tracking:

COUNTS = FPKM*transcript_length/1000000000 * MASS

**choy** · 05-31-2013, 07:31 AM

Thank you! This is extremely helpful!

Unfortunately I have run many samples without the --no-effective-length-correction enabled so I may have to deal with the bias problem for now. For future experiments I will definitely employ this option.

**rufessor** · 11-19-2014, 10:19 AM

sorry to bring an old thread back to life- but this has been bothering me a bit and I wondered the following.

Has anyone ever attempted to empirically check the length correction algorithm?

I doubt this would even be a good test. But I note that the ERCC standards have transcripts ranging from 1995 to 274 bp in length- and further

ERCC-77 is 275 bp length and abundance is 3.66
ERCC-51 is 274 bp in length and abundance is 58.59

Has anyone actually run this through cufflinks using --no-effective-length and without this flag to compare how the ERCC standard curve looks in terms of RKPM (I am single ended) for each situation?

**jparsons** · 11-24-2014, 05:24 PM

Originally posted by rufessor View Post

Has anyone actually run this through cufflinks using --no-effective-length and without this flag to compare how the ERCC standard curve looks in terms of RKPM (I am single ended) for each situation?

I did a quick test of this today and saw no obvious difference (except for a uniformly higher FPKM in "standard") between standard cufflinks and using --no-effective-length, as regards the reported ERCC FPKM. There flat increase issue to the fact that using ERCCs in the cufflinks pipeline is itself fraught because spike-ins break the fundamental assumption of the FPKM calculation. (It's not insane to use ERCC FPKM values, but it's not ideal either, particularly as regards comparing/normalizing samples)

Topics	Statistics	Last Post
Mechanical Forces in DNA Transcription Uncovered by Clemson Researchers by seqadmin Started by seqadmin, 10-02-2024, 04:51 AM	0 responses 13 views 0 likes	Last Post by seqadmin 10-02-2024, 04:51 AM
New Epigenetic Clock Links Cheek Cells to Mortality Risk by seqadmin Started by seqadmin, 10-01-2024, 07:10 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-01-2024, 07:10 AM
AI-Powered Blood Test Shows Promise for Early Ovarian Cancer Detection by seqadmin Started by seqadmin, 09-30-2024, 08:33 AM	0 responses 25 views 0 likes	Last Post by seqadmin 09-30-2024, 08:33 AM
Stem Cell Research Suggests Human Cells May Enter Developmental Pause by seqadmin Started by seqadmin, 09-26-2024, 12:57 PM	0 responses 18 views 0 likes	Last Post by seqadmin 09-26-2024, 12:57 PM

Seqanswers Leaderboard Ad

Announcement

cufflinks reports extremely high FPKMs for short transcripts

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News