Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • skipping cufflinks-->cuffcompare ... straight to cuffdiff?

    I see that most workflows includes tophat --> cufflinks --> cuffcompare --> cuffdiff.

    If I want to perform differential expression analysis on RNASEQ samples based on a known annotation (e.g. Ensembl GTF), can I simply do tophat --> cuffdiff (with the known gtf)?

    What would be the difference if I were to do tophat --> cufflinks --> cuffcompare and use that output gtf in cuffdiff?

  • #2
    depends on your experimental design

    I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

    the ascending order was:
    cuffdiff
    Noiseq
    DESeq
    baySeq
    edgeR
    npSeq
    SAMseq
    poissonSeq

    therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...

    Comment


    • #3
      Originally posted by dietmar13 View Post
      I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

      the ascending order was:
      cuffdiff
      Noiseq
      DESeq
      baySeq
      edgeR
      npSeq
      SAMseq
      poissonSeq

      therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...
      Just because something produces a shorter list of genes doesn't mean it is a worse approach surely..

      Comment


      • #4
        of course,

        but I analysed the same biological question 12 vs 12 paired (colon cancer vs. normal tissue) with microarray and got ~6000 significant genes.

        I would say, 2 significant genes (as I got with cuffdiff) are a little-bit to few and useless for further examinations, thus worse than other approaches.

        I also compared the gene lists derived from the other approaches with the gene list which I got from microarrays, and there was no big difference concerning overlap (and I know, that microarray is not the truth).

        Furthermore, I estimated robustness of obtained lists with bootstrap validation, and got acceptable validations (even though decreasing values with increasing numbers of significant genes).

        therefore, I would say, all gene lists are more or less plausible, also regarding expected differences between cancer and normal tissue expressions.

        The only way to validate all genes for sure, would be to make RT-qPCR with the same samples with all genes...

        Comment


        • #5
          Thanks for elaborating, without figures or reference to a microarray experiment it was rather hard to take on faith

          Comment


          • #6
            I'm a bit skeptical about that Cuffdiff run - we've compared the lists produced by Cuffdiff against the lists produced by arrays run on *exactly* the same RNA, and found that not only does Cuffdiff return a superset of the genes returned by the array analysis, the Cuffdiff lists are highly concordant with DESeq and edgeR (like 90% overlap). Are you sure you're running Cuffdiff correctly, and are you using a recent version?

            Comment


            • #7
              used syntax for cuffdiff

              i have used cuffdiff coming with cufflinks 1.3.

              in $GTF is a gtf prepared the following way:
              (in $genome is a link to the chromosomes)

              cuffcompare -s $genome -CG -r Homo_sapiens.hg19.gtf Homo_sapiens.hg19.gtf

              cuffdiff -o $outdir -p 12 -N -u $GTF $DIR/SRR317086.sam,$DIR/SRR317087.sam,$DIR/SRR317088.sam,$DIR/SRR317089.sam,$DIR/SRR317090.sam,$DIR/SRR317091.sam,$DIR/SRR317092.sam,$DIR/SRR317093.sam,$DIR/SRR317094.sam,$DIR/SRR317095.sam,$DIR/SRR317096.sam,$DIR/SRR317097.sam $DIR/SRR317098.sam,$DIR/SRR317099.sam,$DIR/SRR317100.sam,$DIR/SRR317101.sam,$DIR/SRR317102.sam,$DIR/SRR317103.sam,$DIR/SRR317104.sam,$DIR/SRR317105.sam,$DIR/SRR317106.sam,$DIR/SRR317107.sam,$DIR/SRR317108.sam,$DIR/SRR317109.sam

              Comment


              • #8
                Hmm, it certainly looks OK. Do you have a lot of genes with status FAIL? Where'd you get the GTF?

                Also, how did you map the reads?

                Comment


                • #9
                  i created the gtf from an ENSEMBL gtf (to make it hg19 compatible, I added chr in front of chromosom numbers).

                  gawk 'BEGIN { FS = "\t"; OFS="\t" } ; $1 ~ /^([0-9]+|X|Y|MT)$/ { print "chr" $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 }' $in > cuffdiff/${out}.tmp

                  sed 's/chrMT/chrM/' cuffdiff/${out}.tmp > cuffdiff/${out}

                  rm cuffdiff/${out}.tmp

                  I mapped the reads with Tophat and used these mapped reads also for analysis with the other R-packages for DE (HTseq-count unique).

                  gene_exp.diff
                  11050 FAIL
                  27 HIDATA
                  75 LOWDATA
                  35526 NOTEST
                  2995 OK


                  why does it make so many tests (on gene basis or on transcript basis)?

                  Comment


                  • #10
                    Hi Cole,
                    I see something similar with my data too. cuffdiff produces very few significant (Significant: Yes) genes although I have not compared with other methods. I am using cufflinks v1.3.0 but the Tophat version I used was v1.1.4.

                    I ran Tophat more than an year ago and I do know that Tophat has evolved a lot from then. Do you think I would see a significant difference between the latest version of Tophat and v1.1.4?

                    gene_exp.diff
                    ---------------------------------
                    248975 FAIL
                    811 HIDATA
                    22270 LOWDATA
                    141410 NOTEST
                    134335 OK
                    80 YES (SIGNIFICANT)
                    Last edited by DineshCyanam; 03-01-2012, 09:34 AM.

                    Comment


                    • #11
                      cuffdiff 2.0.2 beta

                      i have now repeated my analysis with cuffdiff 2.0.2 beta and got even zero significant transcripts or genes (CDS). cuffdiff 1.3 found two genes (see above).

                      the design was a 12 versus 12 matched pairs experiment (normal colon mucosa vs. colon cancer tissue) with only median 2.5 mio reads per sample.

                      mapper: tophat.

                      cuffdiff was provided with an ENSEMBL gtf-file, and the analysis run without any error.

                      SAMseq, edgeR, and limma/voom found > 4,000 genes, DESeq > 2,500 using the raw digital count data (HTseq-count).

                      I think cuffdiff (1.3 and 2.0.2 beta) is not the right choice for statistical analysis of experimental designs with many disperse biological replicates and low reading depth.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        The Impact of AI in Genomic Medicine
                        by seqadmin



                        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                        02-26-2024, 02:07 PM
                      • seqadmin
                        Multiomics Techniques Advancing Disease Research
                        by seqadmin


                        New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                        A major leap in the field has
                        ...
                        02-08-2024, 06:33 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 02-28-2024, 06:12 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-23-2024, 04:11 PM
                      0 responses
                      74 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-21-2024, 08:52 AM
                      0 responses
                      85 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 02-20-2024, 08:57 AM
                      0 responses
                      69 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X