Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hlwright
    Member
    • Feb 2011
    • 30

    Different RPKM values in same dataset using Cufflinks or Cuffdiff (v1.3.0)

    I have generated RPKM values for the same dataset using Cufflinks (single input BAM file) and using Cuffdiff where my datasert is part of a larger set of samples (multiple input BAM files). The commands are as follows:-

    PHP Code:
    cufflinks -/path/to/BowtieIndex/genome.fa -p 8 --max-bundle-frags 100000000 -/path/to/genes.gtf sample1.bam 
    PHP Code:
    cuffdiff -/path/to/BowtieIndex/genome.fa -p 8 --max-bundle-frags 100000000 /path/to/genes.gtf sample1.bam sample2.bam sample3.bam sample4.bam sample5.bam sample6.bam 
    I have noticed that when I compare RPKM values for individual genes (genes.fpkm_tracking file), the RPKM values generated by the two programs are different even though the input BAM file is the same, for example:-

    Gene 1
    Cufflinks RPKM 1138.91 (confidence intervals 1130.66,1147.16)
    Cuffdiff RPKM 1572.47 (confidence intervals 637.99,2506.94)

    Gene 2
    Cufflinks RPKM 58.15 (confidence intervals 56.40,59.89)
    Cuffdiff RPKM 78.99 (confidence intervals 39.83,118.15)

    Can anyone explain this please?
  • sdriscoll
    I like code
    • Sep 2009
    • 436

    #2
    This is by design. I think it would be wiser and more science like to use the following type of pipeline:

    Align reads once for expression analysis.
    Quantify gene expression once as either read counts, cufflinks estimates or try RSEM
    Use a DE tool like DESeq, edgeR or the more recent EBSeq which seems to have improved on previous methods a bit.

    It doesn't make sense to get multiple expression estimates from the same data even though it makes sense from the computation logic side of things.

    I think RSEM might be my new favorite except that it doesn't run well on my Mac system. If you have a strong Linux system then it should be good. If you run RSEM multiple times on the same data you will see some variation in its isoform level assignment of expression however that's a result of variable/random behavior of the aligner. The gene level estimates are more stable. This only exposes the fact that we have all probably been dealing with this extra uncertainty in expression values all along. Their pipeline requires bowtie and they run it in a specific way for good quantification estimates. So if you want to try the RSEM pipeline you only need to do that and you can skip the initial alignments because RSEM provides them for you. You would run it once per sample and then merge the data for DE analysis. They recommend EBSeq, in fact they package it with their software.
    Last edited by sdriscoll; 11-08-2012, 09:21 AM.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    • sdriscoll
      I like code
      • Sep 2009
      • 436

      #3
      today I discovered eXpress (http://bio.math.berkeley.edu/eXpress/overview.html) which uses the same basic algorithm as RSEM but is much faster and it produces more verbose output. so far I like it and I've seen that it's expression estimates correlate very highly (r > 0.8) with 'true' expressions from synthetic data analysis. someone shared a slideshow with me outlining an evaluation of current possible pipelines using the BEERS pipeline (http://www.cbil.upenn.edu/BEERS/). cufflinks wasn't even on the map with count estimates correlating 0 < r < 0.2 - or in other words the expression estimates looked like random noise compared to the true values.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment

      • hlwright
        Member
        • Feb 2011
        • 30

        #4
        Thanks sdriscoll - I have been getting more and more frustrated with cufflinks / cuffdiff so will explore these other options.

        Really appreciate you replying to my post.

        Helen

        Comment

        Latest Articles

        Collapse

        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM
        • seqadmin
          Investigating the Gut Microbiome Through Diet and Spatial Biology
          by seqadmin




          The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
          02-24-2025, 06:31 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        17 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        18 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        19 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        186 views
        0 reactions
        Last Post seqadmin  
        Working...