Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Rachelly
    Member
    • Oct 2010
    • 37

    CuffDiff output

    Hi all,
    I used Cufflinks in the following work-flow:
    CuffLinks -> CuffCompare -> CuffDiff

    The output file genes.fpkm_tracking didn't include reference genes at all:

    Code:
    tracking_id     class_code      nearest_ref_id  gene_short_name tss_id  locus   MM_FPKM MM_conf_lo      MM_conf_hi      LOG_FPKM        LOG_conf_lo     LOG_conf_hi     SFT_FPKM        SFT_conf_lo     SFT_conf_hi     NY_FPKMNY_conf_lo       NY_conf_hi
    XLOC_000001     -       -       -       -       SL2.30ch00:551338-551631        1.66555 0       4.24667 0.446456        0       1.7828  0.841447        0       2.67606 0       0       0
    XLOC_000002     -       -       -       -       SL2.30ch00:4196781-4198207      122.746 100.586 144.907 185.302 158.075 212.529 121.462 99.1515 143.773 1616.46 1469.49 1763.43
    Even though the combined.gtf that was created in CuffCompare did contain a lot overlaps with the known genes. Also the isoforms.fpkm_tracking output file DID contain reference annotations, but in the level of exons:
    Code:
    tracking_id     class_code      nearest_ref_id  gene_short_name tss_id  locus   MM_FPKM MM_conf_lo      MM_conf_hi      LOG_FPKM        LOG_conf_lo     LOG_conf_hi     SFT_FPKM        SFT_conf_lo     SFT_conf_hi     NY_FPKMNY_conf_lo       NY_conf_hi
    TCONS_00000001  =       exon:Solyc00g005040.1.1.3       -       -       SL2.30ch00:551338-551631        1.66555 0       4.24667 0.446456        0       1.7828  0.841447        0       2.67606 0       0       0
    TCONS_00000002  o       exon:Solyc00g006470.1.1.4       -       -       SL2.30ch00:4196781-4198207      62.9947 47.1187 78.8707 95.0768 75.573  114.581 52.5381 37.9501 67.126  972.538 908.856 1036.22
    * Of course, when I only ran CuffDiff with the reference GTF - I got gene expression levels with the known genes.

    My questions is:
    Is there a way to get gene (and not exon) expression levels AND novel transcripts using Cufflinks?
    And why in the genes.fpkm_tracking file I don't get the closest reference annotation to that gene?

    Thanks!
    Rachelly.
  • honey
    Senior Member
    • Feb 2010
    • 151

    #2
    gene level

    For gene level run TopHat with Ensembl/ refflat GTF file

    Comment

    • Rachelly
      Member
      • Oct 2010
      • 37

      #3
      Cole's answer

      I consulted Cole on this matter and this was his reply:

      Actually, you won't see those id's in the genes.fpkm_tracking (or, IIRC, the tss_group.fpkm_tracking) files, because as far as Cufflinks is concerned, genes and tss groups are *sets* of transcripts. Each transcript in a gene could have a different nearest reference transcript, so we don't put anything in that field.
      However, the way we recommend doing what (I think) you want here is to use the gene_name attribute. If you compare to a reference file that has gene_name attributes, they will get propogated to the stdout.combined.gtf file from cuffcompare. Ensembl has the gene_name attributes already built in (and the values are typically the HUGO names in the case of human), but you could add them to your reference if they're not there already.

      Comment

      • greener
        Member
        • Sep 2010
        • 17

        #4
        Originally posted by Rachelly View Post
        I consulted Cole on this matter and this was his reply:
        Hi Rachelly, I seem to having the same problem. My Cuffdiff output does not contain gene names. Could you post an example of a reference file that worked and the commands you ran that worked? I tried rerunning cuffcompare with ensembl which contained gene_name attributes but that did not seem to work. The output of my ensembl annotation file:

        11 pseudogene exon 86649 87586 . - . gene_id "ENSG00000224777"; transcript_id "ENST00000424047"; exon_number "1"; gene_name "OR4F2P"; transcript_name "OR4F2P-001";
        11 protein_coding exon 129060 129388 . - . gene_id "ENSG00000230724"; transcript_id "ENST00000382784"; exon_number "1"; gene_name "AC069287.3"; transcript_name "AC069287.3-201";

        Comment

        • severin
          Genome Informatics Facility
          • Sep 2009
          • 105

          #5
          Cuffcompare

          If you ran Cuffcompare with a reference file you can extract the significant Cuffdiff transcript piles and grep out those lines in your combined gtf file which should contain your gene ids. This will tell you which genes are significant.

          Requires unix commands cut, awk, grep, | (pipe) and xargs -I

          Comment

          • jasonwood
            Member
            • May 2010
            • 10

            #6
            I found that I had to use the -s switch in cuffcompare in order for it to propagate my gene names (with gene_name attribute in last column of GTF) all the way through to the final cuffdiff files.

            Comment

            • kareldegendt
              Junior Member
              • Feb 2012
              • 9

              #7
              is genes.gtf the correct annotation file?

              Hi all,
              I had the same problem, but figured that I had to run tophat with the Ensmble "genes.gtf" file, which is what I did.
              All works fine, untill I want to run Cuffmerge:
              There I'm getting the following error:

              Error: duplicate GFF ID 'ENSMUST00000098282' encountered!
              [FAILED]

              In another set I was running, I get the same error with a different ENSMUST number.
              Any clue on what's wrong here? Obviously there's multiple lies with that ID, but why did it go allright with Tophat then????

              Thanks!
              K.

              Comment

              • kareldegendt
                Junior Member
                • Feb 2012
                • 9

                #8
                Ok, I found the issue. Turns out I was being too "efficient"

                I am comparing 2 times 2 datasets, and I was already running the cuffmerge on the second set while the run on the first dataset was still ongoing (wanted to be fast...).
                However, I forgot to change the directory name, so both runs saved to the same dir... and ran into problems.
                It was all solved when I assigned them different directories...

                Karel

                Comment

                • billstevens
                  Senior Member
                  • Mar 2012
                  • 120

                  #9
                  Sorry, I know this is a basic question comparatively, but can someone give me a quick take on the gene ID's. I ran cuffdiff to get the significantly differentially expressed genes. I want to view them in DAVID or Ensembl to check out the actual pathways. I saved all of my 300 or so genes in a txt file with many genes having more than 1 unique ID (e.g. B1AKN3,NP_001036147,Q9P2R6,uc001aph.1) and uploaded to DAVID. However, it could only "ambiguously" match 25 of these genes. What kind of gene IDs are these? There are appear to be more than one kind. How do you view your pathways???

                  Comment

                  • billstevens
                    Senior Member
                    • Mar 2012
                    • 120

                    #10
                    bump

                    Sorry, I'm just having trouble working with these gene names. Some are UniProt, some are RefSeq, some are UCSC. How do you guys do it? DAVID has no idea what I'm uploading? What do you guys use? And does it recognize all the gene names?

                    Comment

                    • billstevens
                      Senior Member
                      • Mar 2012
                      • 120

                      #11
                      Please help...

                      I'm sorry, I'm just so confused on this. Why are there more than one genes listed for promoters.diff, or tss_group.diff, or even gene_exp.diff??? I just don't get it. It says right there in the Cufflinks manual, and I'm quoting:

                      "Transcripts with the same gene_id are part of the same gene group, and similarly, those with the same tss_id and p_id are part of the same primary transcript group and CDS group. "

                      How can one transcription start site be associated with more than one gene?? Likewise with promoters and CDS?

                      Sincere thanks to anyone that can help me with this!
                      Last edited by billstevens; 04-15-2012, 01:12 PM.

                      Comment

                      • billstevens
                        Senior Member
                        • Mar 2012
                        • 120

                        #12
                        Hey guys,

                        So I have this plan for analyzing my data using DAVID, and I was hoping maybe someone might say how they do their differential expression gene analysis. From the output of gene_expression.diff file, I take the significant genes and then I remove all of the subsets of genes (e.g. if uc0012w.1, i make it uc0012w) and then I load this into DAVID. I got rid of the subsets because oftentimes DAVID couldn't find the subset, but DAVID did recognize it without the subset, and I imagine they would both have the same gene. I found that DAVID recognizes all genes that have been reviewed. This seems like a nice and straightforward method for obtaining my network.

                        Am I totally off-base? Anyone?

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          Yesterday, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM
                        • SEQadmin2
                          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                          by SEQadmin2

                          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                          05-06-2026, 09:04 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Yesterday, 12:03 PM
                        0 responses
                        19 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, Yesterday, 11:40 AM
                        0 responses
                        14 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-28-2026, 11:40 AM
                        0 responses
                        29 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-26-2026, 10:12 AM
                        0 responses
                        31 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...