Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JMo
    Junior Member
    • Apr 2012
    • 2

    XLOC gene id

    Does anyone know what the "XLOC" gene IDs are, and how to convert them to actual gene names or some other useable identifier?

    The first few columns of my Cuffdiff data looks like this:

    test_id gene_id gene locus
    XLOC_000001 XLOC_000001 - chr1:162458-171994
    XLOC_000002 XLOC_000002 - chr1:860763-880142
  • severin
    Genome Informatics Facility
    • Sep 2009
    • 105

    #2
    Cufflink IDs

    They are CuffLinks IDs. If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs. If you have a genome without an annotation file then You could extract those sequences and blast them for an initial identification. Though Ideally you would run your genome through Maker or other gene model prediction software before running CuffLinks.

    Comment

    • ojham
      Member
      • May 2012
      • 16

      #3
      If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs.


      can you please explain this in detail steps ? Thank you in advance.

      Comment

      • vkartha
        Member
        • Feb 2012
        • 28

        #4
        I ran Cufflinks with the -G flag (i.e. providing an annotation file (gtf file from UCSC) and suggesting to not perform novel transcript discovery) and I still got this XLOC id format. I am having trouble converting them.

        Comment

        • JQL
          Member
          • Apr 2011
          • 83

          #5
          I saw this thread and thought would like to bring this alive again.

          I am having similar issues. The GTF file I used was from Ensembl where gene IDs are Ensembl IDs. The cuffdiff output file replaced the Ensembl IDs with XLOC_'s although it also output gene names (e.g. BCL2). Ensembl IDs were no longer there.

          Is there anyway to convert XLOC back to Ensemble IDs, or simply keep the ensembl IDs from my GTF file? how do you guys go about this? I try to think what was the authors' intention to replace useful IDs with XLOC's?

          Interesting enough, if I don't run new gene discovery (i.e. without doing cuffmerge step), I got to keep Ensembl IDs.

          thoughts?

          Comment

          • emp
            Member
            • Jan 2014
            • 11

            #6
            I faced the similar problem but then used -g with GTF file and got the IDs in my file during cuffdiff...

            Comment

            • blakeoft
              Member
              • Oct 2013
              • 79

              #7
              I used this solution (http://seqanswers.com/forums/showthread.php?t=18357):
              Thomas Doktor said:
              cuff <- readCufflinks()

              #Retrive significant gene IDs (XLOC) with a pre-specified alpha
              diffGeneIDs <- getSig(cuff,level="genes",alpha=0.05)

              #Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes
              diffGenes<-getGenes(cuff,diffGeneIDs)

              #gene_short_name values (and corresponding XLOC_* values) can be retrieved from the CuffGeneSet by using:
              names<-featureNames(diffGenes)
              row.names(names)=names$tracking_id
              diffGenesNames<-as.matrix(names)
              diffGenesNames<-diffGenesNames[,-1]

              # get the data for the significant genes
              diffGenesData<-diffData(diffGenes)
              row.names(diffGenesData)=diffGenesData$gene_id
              diffGenesData<-diffGenesData[,-1]

              # merge the two matrices by row names
              diffGenesOutput<-merge(diffGenesNames,diffGenesData,by="row.names")
              diffGenesOutput will then by a list of genes with the XLOC name as well as the gene name (like BATF3).
              Last edited by blakeoft; 12-08-2014, 06:43 AM.

              Comment

              • Gonza
                Member
                • Mar 2013
                • 78

                #8
                Hi All,

                This works great, so many thanks. One quick question, I am having a hard time inserting a column between "value_2" and "log2_fold_change". I can make the new column but it goes to the end of the data frame. The new columned (Ratio) it is placed after the 'significant' column. For example:

                myGenesOutput$Ratio <- myGenesOutput$TRT_fpkm/myGenesOutput$CTR_fpkm

                Any thoughts? Thanks
                Cheers
                G

                Comment

                • blakeoft
                  Member
                  • Oct 2013
                  • 79

                  #9
                  Gonza,

                  Just rearrange the columns. For example, if your data frame called df has three columns, and you want the third column to come before the second column, do
                  Code:
                  df <- df[, c(1, 3, 2)]
                  If you're still having trouble, tell me what
                  Code:
                  names(myGenesOutput)
                  gives you along with the desired order of the names, and I'll be able to help you more explicitly.

                  Comment

                  • Gonza
                    Member
                    • Mar 2013
                    • 78

                    #10
                    Thanks so much that rearrange worked fantastic!!!!!!!
                    G

                    Comment

                    • Gonza
                      Member
                      • Mar 2013
                      • 78

                      #11
                      Hello again,

                      I have another R question, please some advice.
                      I am plotting the FPKM expression (log data) of a certain gene using the scrip below and I cannot figure out how to make the y-axis to show up as "10 to the 1", "10 to the 1.5", "10 to the 2", etc.
                      Instead,the graph shows FKPM+1 values as 1, 10 and 100.

                      Any ideas?

                      Script:
                      myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100),logMode=T)
                      myGeneBHLH100_isoform_logModeT + theme_bw()

                      Comment

                      • blakeoft
                        Member
                        • Oct 2013
                        • 79

                        #12
                        Gonza,

                        This appears to be the way to do it with ggplot2. I've tried it with the sample cummeRbund data, and the results are a little goofy. The y axis ticks are at 10^(2.6), 10^(2.8), etc. Maybe it would look better if your data had values that were spread over more powers of 10, or perhaps this is what you're looking for. Try

                        Code:
                        library(scales)
                        myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
                        myGeneBHLH100_isoform_logModeT +
                           theme_bw() +
                           scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                                             labels = trans_format("log10", math_format(10^.x)))
                        Here's a source R Cookbook. See the section titled "Axis transformations: log, sqrt, etc." This page has an example with axis ticks that are integer powers of 10.

                        Edit: Oh. It looks like you're ok with rational powers of 10.

                        Comment

                        • Gonza
                          Member
                          • Mar 2013
                          • 78

                          #13
                          Hey blakeoft, that worked beautifully, thanks much once again!. If you do not mind one last question please.....

                          When i type the command below I get 2 different plots (one for each isoform). Is there a way to plot those isoforms in the sample plot? Somehow they do it the cummeRbund protocol (Fig. 5a - Nature Protocols 7, 562–578 (2012) doi:10.1038/nprot.2012.016)

                          Full script :

                          myGeneId<-"XLOC_010858"
                          myGeneBHLH100<-getGene(cuff_data,myGeneId)
                          myGeneBHLH100

                          XLOC_010858 <-expressionPlot(myGeneBHLH100,logMode=T)
                          XLOC_010858 + theme_bw()

                          myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
                          myGeneBHLH100_isoform_logModeT + theme_bw() + scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                          labels = trans_format("log10", math_format(10^.x)))

                          Comment

                          • blakeoft
                            Member
                            • Oct 2013
                            • 79

                            #14
                            Gonza,

                            It looks like expressionPlot() has been updated at some point so that the isoforms are now plotted side by side. Have you looked at the manual? It has the plots side by side in its example. It also has the FPKM values as integers in log mode, instead of the "10^x" format. I could be wrong because the paper and the manual are both dated 2012.

                            I tried to use ggplot2 to plot this for you. Anyways, this is the best that I could do.

                            Code:
                            iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                                               aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
                            iso_plot +
                               geom_line() + theme_bw() +
                               scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                                             labels = trans_format("log10", math_format(10^.x))) +
                               geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi)) # + geom_point(color = "black", shape = 19)
                            Some aesthetics aren't the same as the normal plots that cummeRbund makes, for example the colors of the lines are different. You can mess around with those colors, the line thickness, etc., but this looks pretty close to what they have in the paper. If you want black data points like in the manual, uncomment the geom_point part on the last line.

                            Edit: I think that some people frown on multiple line plots like this because they can get crowded. One way to mitigate this is to do what is called dodging. Here's how you'd do it for this plot:

                            Code:
                            iso <- isoforms(myGeneBHLH100)
                            pd <- position_dodge(0.3)
                            iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                                               aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
                            iso_plot +
                               geom_line(position = pd) + theme_bw() + 
                               scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                                             labels = trans_format("log10", math_format(10^.x))) +
                               geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi), position = pd) # + geom_point(color = "black", shape = 19, position = pd)
                            Last edited by blakeoft; 10-06-2014, 09:10 AM. Reason: made the black data points come after error bars

                            Comment

                            • Gonza
                              Member
                              • Mar 2013
                              • 78

                              #15
                              Hi blakeoft, that worked well. I am so grateful to your help!.
                              But you are totally right, after playing around with it, the graphs seems pretty crowded, does not look as good as i thought.

                              Again, many many many thanks for your help and time (and i may have another questions as i go along....)

                              Best
                              G

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...