Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CummeRbund 2.0 readCufflinks GTF/genome build options

    I'm trying to use the "transcript-structure level features" in CummeRbund made possible by including a GTF in the readCufflinks command. I'm wondering if these features are limited to certain genomes and if sample/test data exist.

    The readCufflinks command appears fine and I haven't had problems with any other functions, but the features(), makeGeneRegionTrack(), and subsequent commands fail.

    > cuff<-readCufflinks(gtfFile="GmGeneExonGFFixd.gtf",genome="Glyma1.1")

    ....Writing replicates Table
    Reading GTF file
    Writing GTF features to 'features' table...
    Reading ....

    > myGeneId<-"XLOC_000001"
    > myGene<-getGene(cuff,myGeneId)
    > myGene
    CuffGene instance for gene XLOC_000001 ...
    > head(fpkm(myGene))
    gene_id sample_name fpkm conf_hi conf_lo quant_status
    1 XLOC_000001 S2 0.1115820 0.224128 0.0000000 OK
    2 XLOC_000001 S3 0.0543526 0.127996 0.0000000 OK ...

    > head(features(myGene))
    [1] seqnames start end width strand source type score phase gene_id gene_name isoform_id
    <0 rows> (or 0-length row.names)

    > genetrack<-makeGeneRegionTrack(myGene)
    Error in `[.data.frame`(features(object), , featCols) :
    undefined columns selected

    > trackList<-list()

    > myStart<-min(features(myGene)$start)
    Warning message:
    In min(features(myGene)$start) :
    no non-missing arguments to min; returning Inf

    Any help would be greatly appreciated.

  • #2
    Hi clsppb,
    It's entirely possible that this is an issue with the genome name, but I'd have to look at your cuffData.db file to get a bit more information. Can you possibly send it my way?

    Also, was this a dataset that you had already built a database for and you were simply adding the gtf information? Or was this a first-time 'indexing' of your cuffdiff results?

    Cheers,
    Loyal

    Comment


    • #3
      Hi. I messaged you a link to my cuffData file, which I built from scratch with the gtf info. I have though previously tried to add the file to an already-built db.
      Colin

      Comment


      • #4
        It looks like the .gtf file that you passed to readCufflinks() might not have been the same one that was used for your cuffdiff analysis... is this the case?

        The cuffdiff looks like it was using gene_id values that began with 'XLOC_', however, the records in the features table (which is built from the gtf passed to readCufflinks()) indicates that the gene_ids begin with 'Glyma...'. Can you confirm that these are the same file?

        If that is the case, can you send me the .gtf file so that I can take a look at it?

        Cheers,
        Loyal

        Comment


        • #5
          I did use the same gtf passed to readCufflinks() in my cufflinks runs with the -G option and in cuffmerge with the -g option. It's formatted like so:
          Gm01 Gm1_1 exon 28505 28509 . + . transcript_id "Glyma01g00241.1"; gene_id "Glyma01g00241"; gene_name "Glyma01g00241";

          I used the merged.gtf file from cuffmerge with cuffdiff and it does contain the XLOC_ gene_ids:
          Gm01 Cufflinks exon 28505 28509 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "Glyma01g00241"; oId "Glyma01g00241.1"; nearest_ref "Glyma01g00241.1"; class_code "="; tss_id "TSS1"; p_id "P1";

          The transcripts.gtf file produced by cuffmerge has the proper gene_id fields, should/can that be used by cuffdiff instead?
          Gm01 Cufflinks transcript 28505 30546 1000 + . gene_id "Glyma01g00241"; transcript_id "Glyma01g00241.1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "yes";

          Comment


          • #6
            Originally posted by clsppb View Post
            I used the merged.gtf file from cuffmerge with cuffdiff and it does contain the XLOC_ gene_ids:
            Gm01 Cufflinks exon 28505 28509 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "Glyma01g00241"; oId "Glyma01g00241.1"; nearest_ref "Glyma01g00241.1"; class_code "="; tss_id "TSS1"; p_id "P1";
            This is the file that you want to pass to readCufflinks() since this is the one that cuffdiff used to quantify....that way, the identifiers are consistent...

            -Loyal

            Comment


            • #7
              Excellent, thank you very much!

              If anyone happens to be working with a non-UCSC-hosted genome/, to make Gviz happy you'll have to set:
              > options(ucscChromosomeNames=FALSE)

              Comment


              • #8
                What am I missing? Cant get it to work:

                cuff <- readCufflinks(dir = "path/to/cuff-output", gtfFile="genes.gtf", genome="hg19", rebuild = T)

                myGene<-getGene(cuff,myGeneId)

                genetrack<-makeGeneRegionTrack(myGene)
                Error in `[.data.frame`(features(object), , featCols) :
                undefined columns selected

                Also tried: options(ucscChromosomeNames=FALSE)

                Comment


                • #9
                  Hi All,

                  I have the following issue: I run tuxedo (including -G in cufflinks and -g in cuffmege using TAIR10) and I am trying to find more info on the BHLH101 gene highly expressed in my data. All I get, however, is "<0 rows> (or 0-length row.names)" and "Error... " (highlighted in red, please see below).
                  What am i missing? , please some tips. Many thanks in advance.

                  myGeneId<-"BHLH101"
                  myGene<-getGene(cuff_data,myGeneId)
                  myGene

                  head(fpkm(myGene))
                  head(fpkm(isoforms(myGene)))
                  head(features(myGene))

                  [1] feature_id gene_id isoform_id seqnames source type start end score strand
                  [11] frame

                  <0 rows> (or 0-length row.names)


                  genetrack<-makeGeneRegionTrack(myGene)
                  plotTracks(genetrack)

                  Error in `[.data.frame`(features(object), , featCols) :
                  undefined columns selected

                  Comment


                  • #10
                    Hi All,

                    did anyone figure out why this keeps being an issue for so many people?
                    Thanks!!
                    G

                    Comment


                    • #11
                      Originally posted by Gonza View Post
                      Hi All,

                      I have the following issue: I run tuxedo (including -G in cufflinks and -g in cuffmege using TAIR10) and I am trying to find more info on the BHLH101 gene highly expressed in my data. All I get, however, is "<0 rows> (or 0-length row.names)" and "Error... " (highlighted in red, please see below).
                      What am i missing? , please some tips. Many thanks in advance.

                      myGeneId<-"BHLH101"
                      myGene<-getGene(cuff_data,myGeneId)
                      myGene

                      head(fpkm(myGene))
                      head(fpkm(isoforms(myGene)))
                      head(features(myGene))

                      [1] feature_id gene_id isoform_id seqnames source type start end score strand
                      [11] frame

                      <0 rows> (or 0-length row.names)


                      genetrack<-makeGeneRegionTrack(myGene)
                      plotTracks(genetrack)

                      Error in `[.data.frame`(features(object), , featCols) :
                      undefined columns selected
                      Hi, I had the same problem before. You need to use readCufflinks(,gtfFile="path/to/file(the file you used for cuffdiff)"), then it will read the features in the gtf file.

                      Comment


                      • #12
                        Thank for your reply. I am still struggling with this. I have included the gtf filed i used in cufflinks/cuffdiff but I still get the same error.

                        Any ideas?
                        Thanks again

                        rm(list=ls())
                        setwd("/Users/gonzalovillarino/Documents/NCSU/RNAseq/SRA_SRP044814_FeStarvation/05_cuffdiff/")
                        library("cummeRbund")
                        cuff_data=readCufflinks(gtfFile="new_genes.gtf")
                        cuff_data


                        Error:
                        Error in `[.data.frame`(features(object), , featCols) :
                        undefined columns selected

                        Comment


                        • #13
                          Originally posted by Gonza View Post
                          Thank for your reply. I am still struggling with this. I have included the gtf filed i used in cufflinks/cuffdiff but I still get the same error.

                          Any ideas?
                          Thanks again

                          rm(list=ls())
                          setwd("/Users/gonzalovillarino/Documents/NCSU/RNAseq/SRA_SRP044814_FeStarvation/05_cuffdiff/")
                          library("cummeRbund")
                          cuff_data=readCufflinks(gtfFile="new_genes.gtf")
                          cuff_data


                          Error:
                          Error in `[.data.frame`(features(object), , featCols) :
                          undefined columns selected
                          What about head(features(MyGene))? It should not have the same error of <0 rows> (or 0-length row.names)

                          Comment


                          • #14
                            Hey Gonza, were you able to solve this problem? I also have the same problem

                            Originally posted by Gonza View Post
                            Hi All,

                            I have the following issue: I run tuxedo (including -G in cufflinks and -g in cuffmege using TAIR10) and I am trying to find more info on the BHLH101 gene highly expressed in my data. All I get, however, is "<0 rows> (or 0-length row.names)" and "Error... " (highlighted in red, please see below).
                            What am i missing? , please some tips. Many thanks in advance.

                            myGeneId<-"BHLH101"
                            myGene<-getGene(cuff_data,myGeneId)
                            myGene

                            head(fpkm(myGene))
                            head(fpkm(isoforms(myGene)))
                            head(features(myGene))

                            [1] feature_id gene_id isoform_id seqnames source type start end score strand
                            [11] frame

                            <0 rows> (or 0-length row.names)


                            genetrack<-makeGeneRegionTrack(myGene)
                            plotTracks(genetrack)

                            Error in `[.data.frame`(features(object), , featCols) :
                            undefined columns selected
                            Hey Gonza, were you able to solve this problem? I also have the same problem

                            Comment


                            • #15
                              Figured it out for my dataset!

                              Hey everyone,

                              I had the same problem where the following code returned errors:
                              head(features(myGene))

                              [1] feature_id gene_id isoform_id seqnames source type start end score strand
                              [11] frame

                              <0 rows> (or 0-length row.names)

                              genetrack<-makeGeneRegionTrack(myGene)
                              plotTracks(genetrack)

                              Error in `[.data.frame`(features(object), , featCols) :
                              undefined columns selected
                              This is what I did in R to fix the problem:

                              > library("cummeRbund")
                              > library("Gviz")

                              > cuff<-readCufflinks('./diff_out',gtfFile="./merged_asm/merged.gtf",genome="mm10",rebuild=TRUE)

                              > rm(myGene)

                              > myGene<-getGene(cuff,myGeneId)

                              > head(features(myGene))
                              seqnames start end width strand source type score phase
                              1 chr15 82615965 82616212 248 - Cufflinks exon NA NA
                              2 chr15 82616313 82616454 142 - Cufflinks exon NA NA
                              3 chr15 82616748 82616935 188 - Cufflinks exon NA NA
                              ...

                              > genetrack<-makeGeneRegionTrack(myGene)
                              > plotTracks(genetrack)

                              Which successfully worked.

                              If you are interested, I am in Mouse and using mm10 gtf files and genome sequence. My working directory in R was the directory where my cuffdiff output was ('diff_out') and the directory also held the merged.gtf file from cuffmerge ('./merged_asm/merged.gtf'). The rebuild may not be necessary if you are just starting readCufflinks for the first time (but not you may have to include the rebuild=TRUE statement even if you are rerunning R).

                              You may have to also include:
                              options(ucscChromosomeNames=FALSE)
                              if your chromosome names are not in the standard UCSC format (https://stat.ethz.ch/pipermail/bioco...il/052153.html)

                              Note that what you identify as your genome should be the UCSC genome identifier string for your organism.

                              Hope this helps!
                              Last edited by akropor2; 09-11-2015, 02:55 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM
                              • seqadmin
                                Genetic Variation in Immunogenetics and Antibody Diversity
                                by seqadmin



                                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                11-06-2024, 07:24 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-02-2024, 09:29 AM
                              0 responses
                              153 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-02-2024, 09:06 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-02-2024, 08:03 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 11-22-2024, 07:36 AM
                              0 responses
                              76 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X