Originally posted by rufessor
View Post
Seqanswers Leaderboard Ad
Collapse
X
-
-
-
sorry to bring an old thread back to life- but this has been bothering me a bit and I wondered the following.
Has anyone ever attempted to empirically check the length correction algorithm?
I doubt this would even be a good test. But I note that the ERCC standards have transcripts ranging from 1995 to 274 bp in length- and further
ERCC-77 is 275 bp length and abundance is 3.66
ERCC-51 is 274 bp in length and abundance is 58.59
Has anyone actually run this through cufflinks using --no-effective-length and without this flag to compare how the ERCC standard curve looks in terms of RKPM (I am single ended) for each situation?
Leave a comment:
-
-
Thank you! This is extremely helpful!
Unfortunately I have run many samples without the --no-effective-length-correction enabled so I may have to deal with the bias problem for now. For future experiments I will definitely employ this option.
Leave a comment:
-
-
this is a common issue - it's in eXpress as well. in fact any of these tools that uses the "effective length correction" for read counts or expressions. apparently there isn't currently a logical way to fix it. additionally it's only theoretical that this adjustment improves expressions. if you're counting hits in a more general way, like with htseq-count, this adjustment is not made. I don't like it because it says that there's reads in my data that don't exist! it should be obvious that counts for features that are so close to the expected fragment length may be unreliable or lower than they *should* be - that's good enough information for me.
if you want you can disable this adjustment in cufflinks by using their '--no-effective-length-correction' option. this fixes it. i've, for example, compared read counts reverse calculated from the FPKM's cufflinks reports using this option and they are identical to counts i get through a normal naive counting method (at the gene locus level).
by the way you can get those "raw" counts back from cufflinks by keeping the "Raw Map Mass" value it reports during its run and then using the following calculation on the FPKM values in isoforms.fpkm_tracking:
COUNTS = FPKM*transcript_length/1000000000 * MASS
Leave a comment:
-
-
Thanks for pointing this phenomenon out.
I am using cufflinks extensively and noticed this behavior somewhere around the Cufflinks version 1.0.0 release. Older versions of Cufflinks did not seem to have this issue.
Currently I circumvent this by removing or ignoring transcripts shorter than 250bp. Plotting the distribution of FPKMs shows this to be a reasonable cutoff value. I agree that the abnormal increase in FPKM may be tied to the fragment length.
I agree that there is a problem here and hope the developers address it.
Best regards
Leave a comment:
-
-
Hey cram,
did you find a solution to this in the end? Is it different in newer versions of Cufflinks?
I'm actually struggling with a connected problem and intra-sample comparisons:
Is there a possibility to compare the transcript abundance within certain group of transcripts (e.g. Gene_A, Gene_B, Gene_C) to actually rank them by expression (i.e. Gene_A is higher expressed than Gene_C)?
I tried counting within exons and normalizing to the lengths exon-summed transcripts, but there might still be some bias, since some exons will also overlap...
Any ideas?
Leave a comment:
-
-
Originally posted by cram View PostI've been reading through the supplemental methods of the Cufflinks paper and I have a theory about why this is happening. Rather than use the actual transcript length in FPKM calculations, Cufflinks uses what they call an adjusted length. This is intended to account for the fact that the expected fragment length will affect the probability of selecting a fragment from a transcript of a given length.
If I'm following the math correctly then this formula does not really handle cases where the transcript length is significantly shorter than the expected fragment length. It will produce an extremely low value for the adjusted transcript length, which will then cause the high FPKMs.
I've sent an email to the cufflinks developers to ask them if this sounds reasonable. In the meantime I think I'll just exclude transcripts shorter than 200bp or at least ignore the FPKM values for intra-sample expression comparisons.
I have been reading the supplemental material of Cufflinks.
And I have been hurt by the lots of formulas on it.
Could you tell me the reason why they use the adjusted length? What does the length mean in math or biology?
Thanks,
Leave a comment:
-
-
Thanks for your information and proactive action. I did gone through that but never thought of it as a problem. Hope you can get a good answer from the developers.
I am now using RSEM for the calculation of readcount then input for DESeq for differential expression analysis. This way perform well and looks better for me. I'm afraid that removing 200 bp transcripts might removing some useful information for the analysis.
Leave a comment:
-
-
I've been reading through the supplemental methods of the Cufflinks paper and I have a theory about why this is happening. Rather than use the actual transcript length in FPKM calculations, Cufflinks uses what they call an adjusted length. This is intended to account for the fact that the expected fragment length will affect the probability of selecting a fragment from a transcript of a given length.
If I'm following the math correctly then this formula does not really handle cases where the transcript length is significantly shorter than the expected fragment length. It will produce an extremely low value for the adjusted transcript length, which will then cause the high FPKMs.
I've sent an email to the cufflinks developers to ask them if this sounds reasonable. In the meantime I think I'll just exclude transcripts shorter than 200bp or at least ignore the FPKM values for intra-sample expression comparisons.
Leave a comment:
-
-
It's good that I'm not alone.
Code:79 447370 86 148939 100 50999.3 101 142356 103 101460 103 216072
Leave a comment:
-
-
cufflinks reports extremely high FPKMs for short transcripts
I'm seeing some odd FPKM values reported by cufflinks and I'm wondering if anyone else has seen this or can suggest an explanation. Essentially, the shorter a transcript is the higher its FPKM. The shortest transcripts reach ridiculous levels. In a typical experiment, I see:
Code:Tscript Length avg. FPKM -------------- --------- >1000 20 200 - 1000 30 100 - 200 2,500 < 100 130,000
I see this with cufflinks-1.1.0 and 1.0.3, with and without upper quartile normalization.Tags: None
-
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
21 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
26 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
20 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
188 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: