Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • PFS
    Member
    • Mar 2010
    • 55

    skipping cufflinks-->cuffcompare ... straight to cuffdiff?

    I see that most workflows includes tophat --> cufflinks --> cuffcompare --> cuffdiff.

    If I want to perform differential expression analysis on RNASEQ samples based on a known annotation (e.g. Ensembl GTF), can I simply do tophat --> cuffdiff (with the known gtf)?

    What would be the difference if I were to do tophat --> cufflinks --> cuffcompare and use that output gtf in cuffdiff?
  • dietmar13
    Senior Member
    • Mar 2010
    • 107

    #2
    depends on your experimental design

    I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

    the ascending order was:
    cuffdiff
    Noiseq
    DESeq
    baySeq
    edgeR
    npSeq
    SAMseq
    poissonSeq

    therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...

    Comment

    • Bukowski
      Senior Member
      • Jan 2010
      • 388

      #3
      Originally posted by dietmar13 View Post
      I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

      the ascending order was:
      cuffdiff
      Noiseq
      DESeq
      baySeq
      edgeR
      npSeq
      SAMseq
      poissonSeq

      therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...
      Just because something produces a shorter list of genes doesn't mean it is a worse approach surely..

      Comment

      • dietmar13
        Senior Member
        • Mar 2010
        • 107

        #4
        of course,

        but I analysed the same biological question 12 vs 12 paired (colon cancer vs. normal tissue) with microarray and got ~6000 significant genes.

        I would say, 2 significant genes (as I got with cuffdiff) are a little-bit to few and useless for further examinations, thus worse than other approaches.

        I also compared the gene lists derived from the other approaches with the gene list which I got from microarrays, and there was no big difference concerning overlap (and I know, that microarray is not the truth).

        Furthermore, I estimated robustness of obtained lists with bootstrap validation, and got acceptable validations (even though decreasing values with increasing numbers of significant genes).

        therefore, I would say, all gene lists are more or less plausible, also regarding expected differences between cancer and normal tissue expressions.

        The only way to validate all genes for sure, would be to make RT-qPCR with the same samples with all genes...

        Comment

        • Bukowski
          Senior Member
          • Jan 2010
          • 388

          #5
          Thanks for elaborating, without figures or reference to a microarray experiment it was rather hard to take on faith

          Comment

          • Cole Trapnell
            Senior Member
            • Nov 2008
            • 213

            #6
            I'm a bit skeptical about that Cuffdiff run - we've compared the lists produced by Cuffdiff against the lists produced by arrays run on *exactly* the same RNA, and found that not only does Cuffdiff return a superset of the genes returned by the array analysis, the Cuffdiff lists are highly concordant with DESeq and edgeR (like 90% overlap). Are you sure you're running Cuffdiff correctly, and are you using a recent version?

            Comment

            • dietmar13
              Senior Member
              • Mar 2010
              • 107

              #7
              used syntax for cuffdiff

              i have used cuffdiff coming with cufflinks 1.3.

              in $GTF is a gtf prepared the following way:
              (in $genome is a link to the chromosomes)

              cuffcompare -s $genome -CG -r Homo_sapiens.hg19.gtf Homo_sapiens.hg19.gtf

              cuffdiff -o $outdir -p 12 -N -u $GTF $DIR/SRR317086.sam,$DIR/SRR317087.sam,$DIR/SRR317088.sam,$DIR/SRR317089.sam,$DIR/SRR317090.sam,$DIR/SRR317091.sam,$DIR/SRR317092.sam,$DIR/SRR317093.sam,$DIR/SRR317094.sam,$DIR/SRR317095.sam,$DIR/SRR317096.sam,$DIR/SRR317097.sam $DIR/SRR317098.sam,$DIR/SRR317099.sam,$DIR/SRR317100.sam,$DIR/SRR317101.sam,$DIR/SRR317102.sam,$DIR/SRR317103.sam,$DIR/SRR317104.sam,$DIR/SRR317105.sam,$DIR/SRR317106.sam,$DIR/SRR317107.sam,$DIR/SRR317108.sam,$DIR/SRR317109.sam

              Comment

              • Cole Trapnell
                Senior Member
                • Nov 2008
                • 213

                #8
                Hmm, it certainly looks OK. Do you have a lot of genes with status FAIL? Where'd you get the GTF?

                Also, how did you map the reads?

                Comment

                • dietmar13
                  Senior Member
                  • Mar 2010
                  • 107

                  #9
                  i created the gtf from an ENSEMBL gtf (to make it hg19 compatible, I added chr in front of chromosom numbers).

                  gawk 'BEGIN { FS = "\t"; OFS="\t" } ; $1 ~ /^([0-9]+|X|Y|MT)$/ { print "chr" $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 }' $in > cuffdiff/${out}.tmp

                  sed 's/chrMT/chrM/' cuffdiff/${out}.tmp > cuffdiff/${out}

                  rm cuffdiff/${out}.tmp

                  I mapped the reads with Tophat and used these mapped reads also for analysis with the other R-packages for DE (HTseq-count unique).

                  gene_exp.diff
                  11050 FAIL
                  27 HIDATA
                  75 LOWDATA
                  35526 NOTEST
                  2995 OK


                  why does it make so many tests (on gene basis or on transcript basis)?

                  Comment

                  • DineshCyanam
                    Compendia Bio
                    • Oct 2010
                    • 35

                    #10
                    Hi Cole,
                    I see something similar with my data too. cuffdiff produces very few significant (Significant: Yes) genes although I have not compared with other methods. I am using cufflinks v1.3.0 but the Tophat version I used was v1.1.4.

                    I ran Tophat more than an year ago and I do know that Tophat has evolved a lot from then. Do you think I would see a significant difference between the latest version of Tophat and v1.1.4?

                    gene_exp.diff
                    ---------------------------------
                    248975 FAIL
                    811 HIDATA
                    22270 LOWDATA
                    141410 NOTEST
                    134335 OK
                    80 YES (SIGNIFICANT)
                    Last edited by DineshCyanam; 03-01-2012, 09:34 AM.

                    Comment

                    • dietmar13
                      Senior Member
                      • Mar 2010
                      • 107

                      #11
                      cuffdiff 2.0.2 beta

                      i have now repeated my analysis with cuffdiff 2.0.2 beta and got even zero significant transcripts or genes (CDS). cuffdiff 1.3 found two genes (see above).

                      the design was a 12 versus 12 matched pairs experiment (normal colon mucosa vs. colon cancer tissue) with only median 2.5 mio reads per sample.

                      mapper: tophat.

                      cuffdiff was provided with an ENSEMBL gtf-file, and the analysis run without any error.

                      SAMseq, edgeR, and limma/voom found > 4,000 genes, DESeq > 2,500 using the raw digital count data (HTseq-count).

                      I think cuffdiff (1.3 and 2.0.2 beta) is not the right choice for statistical analysis of experimental designs with many disperse biological replicates and low reading depth.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 08:59 AM
                      0 responses
                      11 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      17 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...