Hi all, I am quite new to RNA-seq and bioinformatics in general, so I apologize if these questions are poorly phrased or rather naïve.
(1) I have paired RNA-seq data (normal vs. cancer) for several patients and have begun DEG analysis through the tophat-->cufflinks-->cuffdiff pipeline. I’m wondering if this is even a viable pathway, as a thread regarding an earlier version of cuffdiff expressed some doubts about its suitability for paired designs: http://seqanswers.com/forums/showthread.php?t=7108. Does the latest version of cufflinks/cuffdiff address these problems, or should I consider a different approach?
(2) This is probably a dumb question, but what is wrong with just importing cufflinks’ FPKMs for all my samples into Excel and running a paired t-test on them to determine differential expression? As you can tell, I am not well-versed in the statistics of this...
(3) Assuming I can use cuffdiff, I’ve been encountering issues with the FPKMs it produces, which are all much, much larger than those given by cufflinks for the samples I’ve tried. I know inconsistencies like these have been caused by different default settings between cufflinks/cuffdiff in the past, but several threads have mentioned that the latest version (v2.1.1) should have this fixed. Am I doing something wrong? Below is my code and sample output.
For cufflinks:
In genes.fpkm_tracking:
For cuffdiff:
In genes_exp.diff (value_1 should refer to FPKM for normal sample):
Thank you for all your help!
(1) I have paired RNA-seq data (normal vs. cancer) for several patients and have begun DEG analysis through the tophat-->cufflinks-->cuffdiff pipeline. I’m wondering if this is even a viable pathway, as a thread regarding an earlier version of cuffdiff expressed some doubts about its suitability for paired designs: http://seqanswers.com/forums/showthread.php?t=7108. Does the latest version of cufflinks/cuffdiff address these problems, or should I consider a different approach?
(2) This is probably a dumb question, but what is wrong with just importing cufflinks’ FPKMs for all my samples into Excel and running a paired t-test on them to determine differential expression? As you can tell, I am not well-versed in the statistics of this...
(3) Assuming I can use cuffdiff, I’ve been encountering issues with the FPKMs it produces, which are all much, much larger than those given by cufflinks for the samples I’ve tried. I know inconsistencies like these have been caused by different default settings between cufflinks/cuffdiff in the past, but several threads have mentioned that the latest version (v2.1.1) should have this fixed. Am I doing something wrong? Below is my code and sample output.
For cufflinks:
Code:
cufflinks –G reference.gtf patient_normal.bam
Code:
gene_id locus FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status gene X chr1:123596-123889 0.04095203 0.0225433 0.0751442 OK
Code:
cuffdiff reference.gtf patient_normal.bam patient_tumor.bam
Code:
gene_id locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant gene X chr1:123596-123889 q1 q2 NOTEST 16.7791 17.0898 0.0264722 1 1 no
Comment