Unconfigured Ad

**severin** · 04-10-2012, 04:58 AM

Cufflink IDs

They are CuffLinks IDs. If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs. If you have a genome without an annotation file then You could extract those sequences and blast them for an initial identification. Though Ideally you would run your genome through Maker or other gene model prediction software before running CuffLinks.

**ojham** · 06-01-2012, 11:07 AM

If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs.

can you please explain this in detail steps ? Thank you in advance.

**vkartha** · 07-10-2012, 05:28 PM

I ran Cufflinks with the -G flag (i.e. providing an annotation file (gtf file from UCSC) and suggesting to not perform novel transcript discovery) and I still got this XLOC id format. I am having trouble converting them.

**JQL** · 10-26-2012, 04:09 PM

I saw this thread and thought would like to bring this alive again.

I am having similar issues. The GTF file I used was from Ensembl where gene IDs are Ensembl IDs. The cuffdiff output file replaced the Ensembl IDs with XLOC_'s although it also output gene names (e.g. BCL2). Ensembl IDs were no longer there.

Is there anyway to convert XLOC back to Ensemble IDs, or simply keep the ensembl IDs from my GTF file? how do you guys go about this? I try to think what was the authors' intention to replace useful IDs with XLOC's?

Interesting enough, if I don't run new gene discovery (i.e. without doing cuffmerge step), I got to keep Ensembl IDs.

thoughts?

**emp** · 03-31-2014, 03:32 AM

I faced the similar problem but then used -g with GTF file and got the IDs in my file during cuffdiff...

**blakeoft** · 04-03-2014, 07:54 AM

I used this solution (http://seqanswers.com/forums/showthread.php?t=18357):

Thomas Doktor said:

cuff <- readCufflinks()

#Retrive significant gene IDs (XLOC) with a pre-specified alpha
diffGeneIDs <- getSig(cuff,level="genes",alpha=0.05)

#Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes
diffGenes<-getGenes(cuff,diffGeneIDs)

#gene_short_name values (and corresponding XLOC_* values) can be retrieved from the CuffGeneSet by using:
names<-featureNames(diffGenes)
row.names(names)=names$tracking_id
diffGenesNames<-as.matrix(names)
diffGenesNames<-diffGenesNames[,-1]

# get the data for the significant genes
diffGenesData<-diffData(diffGenes)
row.names(diffGenesData)=diffGenesData$gene_id
diffGenesData<-diffGenesData[,-1]

# merge the two matrices by row names
diffGenesOutput<-merge(diffGenesNames,diffGenesData,by="row.names")

diffGenesOutput will then by a list of genes with the XLOC name as well as the gene name (like BATF3).

**Gonza** · 10-01-2014, 06:59 AM

Hi All,

This works great, so many thanks. One quick question, I am having a hard time inserting a column between "value_2" and "log2_fold_change". I can make the new column but it goes to the end of the data frame. The new columned (Ratio) it is placed after the 'significant' column. For example:

myGenesOutput$Ratio <- myGenesOutput$TRT_fpkm/myGenesOutput$CTR_fpkm

Any thoughts? Thanks
Cheers
G

**blakeoft** · 10-01-2014, 07:30 AM

Gonza,

Just rearrange the columns. For example, if your data frame called df has three columns, and you want the third column to come before the second column, do

Code:

df <- df[, c(1, 3, 2)]

If you're still having trouble, tell me what

Code:

names(myGenesOutput)

gives you along with the desired order of the names, and I'll be able to help you more explicitly.

**Gonza** · 10-01-2014, 07:42 AM

Thanks so much that rearrange worked fantastic!!!!!!!
G

**Gonza** · 10-06-2014, 06:32 AM

Hello again,

I have another R question, please some advice.
I am plotting the FPKM expression (log data) of a certain gene using the scrip below and I cannot figure out how to make the y-axis to show up as "10 to the 1", "10 to the 1.5", "10 to the 2", etc.
Instead,the graph shows FKPM+1 values as 1, 10 and 100.

Any ideas?

Script:
myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100),logMode=T)
myGeneBHLH100_isoform_logModeT + theme_bw()

**blakeoft** · 10-06-2014, 07:12 AM

Gonza,

This appears to be the way to do it with ggplot2. I've tried it with the sample cummeRbund data, and the results are a little goofy. The y axis ticks are at 10^(2.6), 10^(2.8), etc. Maybe it would look better if your data had values that were spread over more powers of 10, or perhaps this is what you're looking for. Try

Code:

library(scales)
myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
myGeneBHLH100_isoform_logModeT +
   theme_bw() +
   scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                     labels = trans_format("log10", math_format(10^.x)))

Here's a source R Cookbook. See the section titled "Axis transformations: log, sqrt, etc." This page has an example with axis ticks that are integer powers of 10.

Edit: Oh. It looks like you're ok with rational powers of 10.

**Gonza** · 10-06-2014, 07:54 AM

Hey blakeoft, that worked beautifully, thanks much once again!. If you do not mind one last question please.....

When i type the command below I get 2 different plots (one for each isoform). Is there a way to plot those isoforms in the sample plot? Somehow they do it the cummeRbund protocol (Fig. 5a - Nature Protocols 7, 562–578 (2012) doi:10.1038/nprot.2012.016)

Full script :

myGeneId<-"XLOC_010858"
myGeneBHLH100<-getGene(cuff_data,myGeneId)
myGeneBHLH100

XLOC_010858 <-expressionPlot(myGeneBHLH100,logMode=T)
XLOC_010858 + theme_bw()

myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
myGeneBHLH100_isoform_logModeT + theme_bw() + scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))

**blakeoft** · 10-06-2014, 08:59 AM

Gonza,

It looks like expressionPlot() has been updated at some point so that the isoforms are now plotted side by side. Have you looked at the manual? It has the plots side by side in its example. It also has the FPKM values as integers in log mode, instead of the "10^x" format. I could be wrong because the paper and the manual are both dated 2012.

I tried to use ggplot2 to plot this for you. Anyways, this is the best that I could do.

Code:

iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                   aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
iso_plot +
   geom_line() + theme_bw() +
   scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                 labels = trans_format("log10", math_format(10^.x))) +
   geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi)) # + geom_point(color = "black", shape = 19)

Some aesthetics aren't the same as the normal plots that cummeRbund makes, for example the colors of the lines are different. You can mess around with those colors, the line thickness, etc., but this looks pretty close to what they have in the paper. If you want black data points like in the manual, uncomment the geom_point part on the last line.

Edit: I think that some people frown on multiple line plots like this because they can get crowded. One way to mitigate this is to do what is called dodging. Here's how you'd do it for this plot:

Code:

iso <- isoforms(myGeneBHLH100)
pd <- position_dodge(0.3)
iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                   aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
iso_plot +
   geom_line(position = pd) + theme_bw() + 
   scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                 labels = trans_format("log10", math_format(10^.x))) +
   geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi), position = pd) # + geom_point(color = "black", shape = 19, position = pd)

**Gonza** · 10-06-2014, 09:22 AM

Hi blakeoft, that worked well. I am so grateful to your help!.
But you are totally right, after playing around with it, the graphs seems pretty crowded, does not look as good as i thought.

Again, many many many thanks for your help and time (and i may have another questions as i go along....)

Best
G

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Today, 05:37 AM	0 responses 5 views 0 reactions	Last Post by SEQadmin2 Today, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 50 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

XLOC gene id

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News