Unconfigured Ad

**apratap** · 10-12-2009, 08:25 PM

Hi Cole

Thanks for the update. I took a quick glance and just wondering if there are some methods available to visualize the transcript abundances reported in the GTF format. Also any way to simultaneously compare control and sample.

-Abhi

**Cole Trapnell** · 10-12-2009, 09:33 PM

Hi Abhi,

We recommend using cuffcompare to compare control and sample, though you may need to tailor significance tests, etc. to your analysis. You might want to check out this paper for more details on control and sample testing with RPKM.

As for visualization, we don't have anything out for that yet, as supporting a UI on various platforms, etc. is more engineering resources than we can really afford right now. However, we've tried to make the various output files as friendly as possible to environments like R, so that you can use your plotting environment of choice.

If anyone were to contribute scripts or packages that depended on standard plotting tools, we would happily include them in the Cufflinks distribution (properly credited, of course).

**krobison** · 10-13-2009, 07:18 AM

A tip of the cap to continuing the classy nomenclature theme!

**vkrishna** · 10-25-2009, 04:07 PM

Setting up GBrowse to view transcript abundance

Hi apratap,
I've been working on visualizing the transcript abundance data in GBrowse and have had some luck with it. Here is how I went about it,
1. Loading the Cufflinks identified pseudo-transcripts is very easy (GBrowse seems to accept the GTF as input and is able to display the transcripts)
2. To display the abundance data, you need to perform the following transformation on the GTF file and create a GFF file for use with GBrowse. As an example, here are a few lines from the GTF file

Code:

124 Cufflinks   transcript  1551    1663    1000    .   .   gene_id "CUFF.1"; transcript_id "CUFF.1.0"; RPKM "268.2781599177"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.    000000"; cov "0.637168";
124 Cufflinks   exon    1551    1663    1000    .   .   gene_id "CUFF.1"; transcript_id "CUFF.1.0"; exon_number "1"; RPKM "268.2781599177"; frac "1.000000"; conf_lo "0.000000";   conf_hi "0.000000"; cov "0.637168";
124 Cufflinks   transcript  8604    9455    1000    .   .   gene_id "CUFF.4"; transcript_id "CUFF.4.0"; RPKM "1405.4689751085"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.   000000"; cov "3.338028";
124 Cufflinks   exon    8604    9455    1000    .   .   gene_id "CUFF.4"; transcript_id "CUFF.4.0"; exon_number "1"; RPKM "1405.4689751085"; frac "1.000000"; conf_lo "0.000000";  conf_hi "0.000000"; cov "3.338028";
124 Cufflinks   transcript  9540    9757    1000    .   .   gene_id "CUFF.7"; transcript_id "CUFF.7.0"; RPKM "903.9004975207"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.    000000"; cov "2.146789";
124 Cufflinks   exon    9540    9757    1000    .   .   gene_id "CUFF.7"; transcript_id "CUFF.7.0"; exon_number "1"; RPKM "903.9004975207"; frac "1.000000"; conf_lo "0.000000";   conf_hi "0.000000"; cov "2.146789";
124 Cufflinks   transcript  9825    12624   1000    .   .   gene_id "CUFF.10"; transcript_id "CUFF.10.0"; RPKM "1082.6940025248"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "2.571429";
124 Cufflinks   exon    9825    12624   1000    .   .   gene_id "CUFF.10"; transcript_id "CUFF.10.0"; exon_number "1"; RPKM "1082.6940025248"; frac "1.000000"; conf_lo "0.        000000"; conf_hi "0.000000"; cov "2.571429";
124 Cufflinks   transcript  12678   13027   1000    .   .   gene_id "CUFF.13"; transcript_id "CUFF.13.0"; RPKM "1775.6181641407"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "4.217143";
124 Cufflinks   exon    12678   13027   1000    .   .   gene_id "CUFF.13"; transcript_id "CUFF.13.0"; exon_number "1"; RPKM "1775.6181641407"; frac "1.000000"; conf_lo "0.        000000"; conf_hi "0.000000"; cov "4.217143";
124 Cufflinks   transcript  14562   14603   1000    .   .   gene_id "CUFF.16"; transcript_id "CUFF.16.0"; RPKM "721.7960016832"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.  000000"; cov "1.714286";
124 Cufflinks   exon    14562   14603   1000    .   .   gene_id "CUFF.16"; transcript_id "CUFF.16.0"; exon_number "1"; RPKM "721.7960016832"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.714286";
124 Cufflinks   transcript  17793   17835   1000    .   .   gene_id "CUFF.19"; transcript_id "CUFF.19.0"; RPKM "705.0100481557"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.  000000"; cov "1.674419";
124 Cufflinks   exon    17793   17835   1000    .   .   gene_id "CUFF.19"; transcript_id "CUFF.19.0"; exon_number "1"; RPKM "705.0100481557"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.674419";
124 Cufflinks   transcript  27148   27726   1000    .   .   gene_id "CUFF.22"; transcript_id "CUFF.22.0"; RPKM "2853.5251258254"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "6.777202";
124 Cufflinks   exon    27148   27726   1000    .   .   gene_id "CUFF.22"; transcript_id "CUFF.22.0"; exon_number "1"; RPKM "2853.5251258254"; frac "1.000000"; conf_lo "0.        000000"; conf_hi "0.000000"; cov "6.777202";
124 Cufflinks   transcript  27829   29109   1000    .   .   gene_id "CUFF.25"; transcript_id "CUFF.25.0"; RPKM "3585.3145657380"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "8.515222";
124 Cufflinks   exon    27829   29109   1000    .   .   gene_id "CUFF.25"; transcript_id "CUFF.25.0"; exon_number "1"; RPKM "3585.3145657380"; frac "1.000000"; conf_lo "0.        000000"; conf_hi "0.000000"; cov "8.515222";
124 Cufflinks   transcript  29171   34842   1000    .   .   gene_id "CUFF.28"; transcript_id "CUFF.28.0"; RPKM "2442.5515614083"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "5.801128";
124 Cufflinks   exon    29171   34842   1000    .   .   gene_id "CUFF.28"; transcript_id "CUFF.28.0"; exon_number "1"; RPKM "2442.5515614083"; frac "1.000000"; conf_lo "0.        000000"; conf_hi "0.000000"; cov "5.801128";
124 Cufflinks   transcript  34945   36257   1000    .   .   gene_id "CUFF.31"; transcript_id "CUFF.31.0"; RPKM "3751.9099097396"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "8.910891";
124 Cufflinks   exon    34945   36257   1000    .   .   gene_id "CUFF.31"; transcript_id "CUFF.31.0"; exon_number "1"; RPKM "3751.9099097396"; frac "1.000000"; conf_lo "0.        000000"; conf_hi "0.000000"; cov "8.910891";
124 Cufflinks   transcript  37803   38013   1000    .   .   gene_id "CUFF.34"; transcript_id "CUFF.34.0"; RPKM "1867.7754356353"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0. 000000"; cov "4.436019";
124 Cufflinks   exon    37803   38013   1000    .   .   gene_id "CUFF.34"; transcript_id "CUFF.34.0"; exon_number "1"; RPKM "1867.7754356353"; frac "1.000000"; conf_lo "0.        000000"; conf_hi "0.000000"; cov "4.436019";

The above segment is converted into the following

Code:

124 Cufflinks   abundance   1551    1663    268.2781599177  .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   8604    9455    1405.4689751085 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   9540    9757    903.9004975207  .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   9825    12624   1082.6940025248 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   12678   13027   1775.6181641407 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   14562   14603   721.7960016832  .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   17793   17835   705.0100481557  .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   27148   27726   2853.5251258254 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   27829   29109   3585.3145657380 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   29171   34842   2442.5515614083 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   34945   36257   3751.9099097396 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   37803   38013   1867.7754356353 .   .   ID=Cufflinks_transcript_abundance
124 Cufflinks   abundance   38163   38302   6604.4334154015 .   .   ID=Cufflinks_transcript_abundance

And use the following GBrowse configuration for the abundance track:

Code:

[abundance]
feature = abundance
category = Cufflinks_abundance
glyph = xyplot
graph_type = boxes
fgcolor = brown
bgcolor = orange
scale = left
min_score = 0
height = 50
group_on = display_name

and this is what you see!

Hope this helps!

Best,
~Vivek

**apratap** · 10-26-2009, 07:40 AM

Hey vkrishna

Thanks a lot for sharing your method. I will try to work on it sometime later this week.

Best,
-Abhi

**marvin.j** · 10-28-2009, 05:54 AM

Bug: Cufflinks merges proximal genes, although there is proper annotation

In our group we have just noticed that proximal genes get merged by cufflinks.
While this is probably unavoidable for a run without any annotation it is clearly wrong in case the correct annotation is provided and ignored/overridden.

Here's one example from the mitochondrial chromosome (kindly provided by arthur.yxt):

cufflinks genes.expr file says:

Code:

ENSG00000198899      713201 chrM   8527   9988   5618.54

But a look at the GTF annotation file reveals there are actually two genes being merged:

Code:

chrM    .       exon    8528    9208    .       +       .       gene_id "ENSG00000198899";transcript_id "ENST00000361899"
chrM    .       exon    9208    9988    .       +       .       gene_id "ENSG00000198938";transcript_id "ENST00000362079"

While extending an annotated gene by proximal stuff that is otherwise "intergenic" is probably a good idea (TSS or PolyA-site variation, cryptic exons, etc.) in this case it is clearly wrong as the merged part is not intergenic but a complete neighboring gene.

If needed, a small dataset to reproduce this bug can be provided.

Best!
-Marvin

**Cole Trapnell** · 10-28-2009, 08:50 AM

Originally posted by marvin.j View Post

In our group we have just noticed that proximal genes get merged by cufflinks.
While this is probably unavoidable for a run without any annotation it is clearly wrong in case the correct annotation is provided and ignored/overridden.

Here's one example from the mitochondrial chromosome (kindly provided by arthur.yxt):

cufflinks genes.expr file says:

Code:

ENSG00000198899      713201 chrM   8527   9988   5618.54

But a look at the GTF annotation file reveals there are actually two genes being merged:

Code:

chrM    .       exon    8528    9208    .       +       .       gene_id "ENSG00000198899";transcript_id "ENST00000361899"
chrM    .       exon    9208    9988    .       +       .       gene_id "ENSG00000198938";transcript_id "ENST00000362079"

While extending an annotated gene by proximal stuff that is otherwise "intergenic" is probably a good idea (TSS or PolyA-site variation, cryptic exons, etc.) in this case it is clearly wrong as the merged part is not intergenic but a complete neighboring gene.

If needed, a small dataset to reproduce this bug can be provided.

Best!
-Marvin

There are really two issues here:

1) The reference-based quantitation is only reporting one gene in this example. This is probably a bug, and I would greatly appreciate a small test set via email

2) The assembly algorithm merges overlapping genes. In strandless prototocols, merging overlapping, unspliced genes or overlapping genes on the same strand is tough to avoid as you suggest (although tracking polyA reads may help here). However, in my experience these are extremely rare in most organisms that have splicing. Yeast is a notable exception. The mitochondrial genome is one place I *would* expect it, given the gene density, lack of splicing, and generally very high read count (with associated polymerase run-on, etc.).

The example you gave, if I understand correctly, demonstrates 1). Are you also seeing 2)? And If so, can you give us a sense of how often? Fighting this problem via polyA tracking and local assembly is high on the list of planned features, but if users are running into this a lot, I could try to get to it sooner rather than later.

**gard** · 10-29-2009, 07:29 AM

Removed, found answer somewhere else in the forum.

**sdriscoll** · 12-09-2009, 04:37 PM

This is probably a simple question but I just don't understand the computation of RPKM well enough to answer it for myself.

I've run cufflinks on tophat output (both run without a reference annotation) and then finally cuffcompare with a reference annotation to map the cufflinks transcripts to the UCSC transcripts. My question is this: in the stdout.tracking file that's produced from cuffcompare I'll find several cufflinks determined transcripts that have been mapped to a single UCSC transcript. Each of those cufflinks transcripts has an RPKM - how do I translate those RPKMs to the single UCSC transcript that cuffcompare has mapped them to?

example lines from my tracking file:

Code:

uc008bzg.1|uc008bzg.1	e	q1:CUFF.209682|CUFF.209682.0|100|10.337529|0.000000|0.000000|uniq
uc008bzg.1|uc008bzg.1	j	q1:CUFF.209785|CUFF.209785.0|100|259.417274|0.000000|0.000000|uniq
uc008bzg.1|uc008bzg.1	c	q1:CUFF.209691|CUFF.209691.0|100|34.891979|0.000000|0.000000|uniq
uc008bzg.1|uc008bzg.1	c	q1:CUFF.209695|CUFF.209695.0|100|135.598390|0.000000|0.000000|uniq
uc008bzg.1|uc008bzg.1	c	q1:CUFF.209701|CUFF.209701.0|100|133.368072|0.000000|0.000000|uniq

I'd like to get a single RPKM for uc008bzg.1 from those five lines. Thanks in advance!

**sdriscoll** · 12-10-2009, 01:57 PM

OK, I've thought about this and it's likely that the way to pull off what I want is to consider it a problem of merging multiple things made up of the same materials (in this case bp of exon) all with different densities (RPKMs). That breaks it down to calculating a weighted mean. Of course I'll need total bp worth of exon from each of those transcripts but with that info it should be an easy calculation.

**Cole Trapnell** · 12-11-2009, 12:01 AM

I've had decent success doing that, by performing a transfrag length-weighted average of the RPKM.

**apratap** · 12-11-2009, 12:44 AM

Hi Cole and Driscoll

Good that you have asked this before I actually realized. So I am a bit confused now. Why does cufflinks for the same sample, maps diff cufflinks predicted transcripts to same UCSC transcript. I think I am missing some biology here but may be this question is not trivial.

It will me understand better if you guys could explain a bit more.

-Abhi

**Cole Trapnell** · 12-11-2009, 09:41 AM

After normalizing for transcript length, the fraction of the total reads in your experiment that come from any one transcript is roughly proportional to the abundance of that transcript.

What we're seeing here is a transcript that is rare enough in the sample that it doesn't get many reads, and thus has gaps in sequencing coverage. When this happens, Cufflinks can't assemble the full thing. This is one of the central differences between Cufflinks and say, an ab initio gene finder like Augustus. So what happens is that Cufflinks assembles the reads that came from this rare transcript as best it can, and the result is a handful of "transfrags", each of which get their own RPKM estimate. What sdriscoll is trying to do here is estimate the RPKM of the whole transcript from the individual fragment RPKMs.

**ndaniel** · 12-15-2009, 06:34 AM

Hi Cole,

does Cufflinks take as input the 'raw' SAM file produced by Bowtie (>version 0.11.1)?

Daniel

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Today, 08:59 AM	0 responses 8 views 0 reactions	Last Post by SEQadmin2 Today, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

Cufflinks: transcript assembly and abundance estimation for RNA-seq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News