Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • combiochem
    Member
    • Jul 2009
    • 11

    Transcripts expression estimation using cuffdiff by providing reference annotaion

    I want to try to estimate the expression level for the transcripts in each gene.
    Rather than using the isoforms generated by TopHat-Cufflinks pipeline, I want to use the known annotations.
    When I run cuffdiff, I provided the mapping results in SAM format and the ensembl annotation as GTF file.
    When I checked the cuffdiff results, there are some weird things in the gene boundaries for the test.

    For example, gene ENSMUSG00000029019 structure is stored in the gtf file like below.

    Code:
    chr4    mm9_ensGene start_codon 147360085   147360087   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene CDS 147360085   147360210   0.000000    +   0   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene exon    147360009   147360210   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene CDS 147360405   147360627   0.000000    +   0   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene exon    147360405   147360627   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene CDS 147361071   147361087   0.000000    +   2   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    chr4    mm9_ensGene exon    147361071   147361306   0.000000    +   .   gene_id "ENSMUSG00000029019"; transcript_id "ENSMUST00000103231";
    Therefore, the gene is starting from 147360009 to 147361306 (1-based position).
    But in the cuffdiff result, genes.fpkm_tracking, the locus for the gene is much larger than the original one, from 147326657 to 147416061.
    Code:
    tracking_id class_code  nearest_ref_id  gene_short_name tss_id  locus   q0_FPKM q0_conf_lo  q0_conf_hi  q1_FPKM q1_conf_lo  q1_conf_hi
    ENSMUSG00000029019  -   -   -   -   chr4:147326657-147416061    72.4927 68.7873 76.1981 47.0939 43.9784 50.2093
    Does it mean that cuffdiff is trying to set the new gene locus (or boundaries) based on the supplied short read data and the provided gene annotation (e.g. emsembl gtf file) is just used as the guidance?
    In that case, is there any way to estimate the expression using the exact gene structures provided by user rather than cufflinks definition?

    Thanks for any comments in advance.
  • Cole Trapnell
    Senior Member
    • Nov 2008
    • 213

    #2
    This behavior changed in 0.8.2. What goes in that locus tag is really just a tag that you can copy into a browser window. It's NOT meant to define the boundaries of the object being tested precisely. It's just there so that you can grab a line out of your file and pop open a browser window to see not only that record, but all the records that cuffdiff processed simultaneously.

    I implemented this behavior to reflect the way I use cuffdiff: I see an isoform-level record for example, and I immediately want to see a UCSC browser shot of not only that isoform, but the whole gene it lives in, along what whatever else is in the neighborhood.

    I will update the manual to describe in more detail what this locus tag means.

    Comment

    • combiochem
      Member
      • Jul 2009
      • 11

      #3
      It makes sense. Thanks for the explanation of the tag.

      Comment

      • sunnyvu
        Member
        • Mar 2010
        • 17

        #4
        I have a question on cuffdiff output. When I use cuffcompare, I use the reference annoation downlowd from UCSC.
        ./cuffcompare -o sample1_4 -r hg18_ref.gtf -R ../sample1/transcripts.gtf /../sample4/transcripts.gtf

        Then I use cuffdiff is following:
        ./cuffdiff -m 200 sample1_4.combined.gtf s1_accepted_hits.sam s4_accepted_hits.sam

        In my results, there is no anything come out for 0_1_cds.diff except of the title line?

        Is this reasonable?
        Thanks in advance

        Comment

        • maximilianh
          Member
          • Oct 2009
          • 15

          #5
          UCSC gtf files?

          Originally posted by sunnyvu View Post
          I have a question on cuffdiff output. When I use cuffcompare, I use the reference annoation downlowd from UCSC.
          In my results, there is no anything come out for 0_1_cds.diff except of the title line?
          I had the same problem. Is it possible that you have to use the GTF files from Ensembl? E.g. ftp://ftp.ensembl.org/pub/release-58/gtf/

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          17 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          27 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          38 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          61 views
          0 reactions
          Last Post SEQadmin2  
          Working...