Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • MeixiaZhao
    Junior Member
    • Apr 2012
    • 9

    cufflinks_cuffdiff confusing

    I have two samples, one is without treatment, the other one is with treatment. I want to detect some altertive spliced transcripts or significant differential expressed transcripts after treatment. I got the transcripts from each of the two samples using cufflinks. And then I ran cuffcompare to see the specific transcripts and then run cuffdiff. At this stage, I got a problem. No matter from the transcripts.gtf files or the cuffcpm.tracking file, it shows CUFF.1.1 is a new transcript after treatment. However, when you use cuffdiff to detect the expression level, you also can get a value of CUFF.1.1 in no treatment sample. In this case, could I call CUFF.1.1 is a new transcript after treatment? There're so many this kind of cases. It seems the output from cuffdiff is inconsistent with the output from cufflinks, especially for isforms level. How can I solve this problem? Can anyone give me some advices, as I get stuck here for a long time?
    Last edited by MeixiaZhao; 05-01-2012, 10:05 AM.
  • sdriscoll
    I like code
    • Sep 2009
    • 436

    #2
    i'm not entirely sure what the problem is. are you confused about the cufflinks annotation for genes (CUFF.X.X)?

    try this with your samples:

    1. align with tophat
    2. obtain "transcripts.gtf" for each sample via cufflinks
    3. run cuffmerge on all of the transcripts.gtf files from cufflinks
    4. run cuffcompare on the output of cuffmerge to compare it to a known gene annotation (assuming your species has one). you may download those from Ensemble or UCSC's table browser
    5. run cuffdiff on your samples using the output of cuffcompare as your reference annotation

    as a result of the cuffmerge -> cuffcompare stage your reference annotation will include known annotation id's (such as ensemble gene/transcript ids or UCSC ids). the "CUFF.X.X" ids will be gone. novel genes that are not present in the known annotation will be named "XLOC_*" with transcript names "TCONS_*" where * will be replaced by some multi-digit number with leading zeros.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    • MeixiaZhao
      Junior Member
      • Apr 2012
      • 9

      #3
      I'm sorry I should talk in detail, I did in this way:
      1. align with tophat
      2. obtain "transcripts.gtf" for each sample via cufflinks
      3. run cuffcomapre to get cuffcmp.combined.gtf and cuffcmp.tracking. From the cuffcmp.tracking file you can track the transcripts pairs in previous two transcripts.gtf:
      TCONS_00000001 XLOC_000001 Glyma01g00320|PAC:16242897 j q1:Wound.5|Wound.5.1|100|4.065532|3.183595|4.947470|6.244929|2793 q2:Inoculate.3|Inoculate.3.1|100|3.278571|2.377879|4.179264|3.927685|2824

      TCONS_00003640 XLOC_001025 Glyma01g00270|PAC:16242891 j -
      q2:Inoculate.1|Inoculate.1.1|100|6.060541|2.561486|9.559596|5.658878|-
      in this file, based on structure you can see the transcript Inoculate.1.1 only exists in q2 sample (treatment)
      4. run cuffdiff, the results show like this:
      TCONS_00003640 XLOC_001025 - Inoculate.1.1 - 6.0605409438 Glyma01g00270 Gm01:27935-61502 q1 q2 OK 3.67357 6.55645 0.835731 -1.09456 0.273708 0.510541 no
      q1=3.67357(no treatment),q2=6.55645(treatment), you also can get the expression data for Inoculate.1.1 in no treatment sample although cufflinks didn't report this transcript in its own transcripts.gtf.

      It's a little hard to describ this issue. Based on my understanding, for no treatment sample, the reads or fragments are not enough for cufflinks to report transcript Inoculate.1.1, but later on if you provide this structure for cuffdiff to caculte expression, it will give the expreesion data. I'm not sure my understanding is right or not. I also checked the junction.bed. Even transcripts.gtf of no treatment didn't report Inoculate.1.1, I still found some junctions located in Inoculate.1.1. This is consistent with my understanding. How do you think?
      Last edited by MeixiaZhao; 05-01-2012, 12:39 PM.

      Comment

      • sdriscoll
        I like code
        • Sep 2009
        • 436

        #4
        i see. i think what's going on is the difference between how cufflinks thinks and just simply counting reads aligning to a gene. let's say you've got a get with 5 exons and only 3 of them have coverage. cufflinks may very well say that gene is not present. it's sometimes kind of harsh in that way. however, three of the exons have coverage and you can, therefore, obtain an expression level for that gene by quantifying those reads in the appropriate way. i think that's what cuffdiff is doing in your case. for some reason cufflinks is not able to recover that gene in its transcriptome assembly process but there is still coverage there than can be quantified by cuffdiff.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        24 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        42 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        48 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Working...