Hi Folks,
I'm currently dealing with 30+ samples that I would like to generate "genes.fpkm_tracking" and "isoforms.fpkm_tracking" files via cuffdiff. That is, one aggregate file across all samples for both genes and isoforms. However, as you might imagine, cuffdiff spends time doing other analyses such as differential expression that I'm not interested in and additionally takes an absurd amount of time (it's been running for 7+ days on a 24core high memory machine, with no end in sight).
As a quicker approach, I'm wondering if using the "transcripts.gtf" file from cuffmerge and rerunning cufflinks using the -G flag, which prevents novel isoform detection, for each of my samples (and then merging all the individual files via awk) will be sufficient to generate fpkm_tracking files for all known and novel genes that I identified in the first pass of the pipeline.
The reason I want these aggregate files to begin with is because I would like to have FPKM counts for each known/novel isoform detected from my first iteration of cufflinks (using the -g flag) across all my 30+ individuals in order to compare them to each other. However, I'm not sure if I will be introducing funky biases into my data by going this route.
Any thoughts from the RNAseq gurus out there would be much appreciated!
I'm currently dealing with 30+ samples that I would like to generate "genes.fpkm_tracking" and "isoforms.fpkm_tracking" files via cuffdiff. That is, one aggregate file across all samples for both genes and isoforms. However, as you might imagine, cuffdiff spends time doing other analyses such as differential expression that I'm not interested in and additionally takes an absurd amount of time (it's been running for 7+ days on a 24core high memory machine, with no end in sight).
As a quicker approach, I'm wondering if using the "transcripts.gtf" file from cuffmerge and rerunning cufflinks using the -G flag, which prevents novel isoform detection, for each of my samples (and then merging all the individual files via awk) will be sufficient to generate fpkm_tracking files for all known and novel genes that I identified in the first pass of the pipeline.
The reason I want these aggregate files to begin with is because I would like to have FPKM counts for each known/novel isoform detected from my first iteration of cufflinks (using the -g flag) across all my 30+ individuals in order to compare them to each other. However, I'm not sure if I will be introducing funky biases into my data by going this route.
Any thoughts from the RNAseq gurus out there would be much appreciated!