Hello all,
I'm running cufflinks on several large human samples on a cluster with time restrictions. In order to speed things up, I tried splitting the bam files by chromosome and running cufflinks on each chromosome separately. I'd like to find both novel and known transcripts, so for each cufflinks run, I also provided a full gtf file with all known transcripts.
Example cufflinks commands:
cufflinks -p 4 -g genes_ucsc.gtf -o b.1/cufflinks_out b.1/chr1.bam
cufflinks -p 4 -g genes_ucsc.gtf -o b.2/cufflinks_out b.2/chr2.bam
I'm having difficulties merging the results from each cufflinks run. A snippet of my cufflinks transcripts.gtf file from chr1:
chr1 Cufflinks transcript 29536 30065 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "3.6295414709"; frac "1.000000"; conf_lo "1.868955"; conf_hi "5.390128"; cov "7.520130"; full_read_support "yes";
chr10 Cufflinks transcript 92828 95178 1 - . gene_id "TUBB8"; transcript_id "NM_177987"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
As you can see, chr1's transcripts.gtf file has transcripts from all chromosomes (because my gtf file has known transcripts from all chromosomes), but only transcripts on chr1 have a non-zero FPKM value. To merge these transcripts.gtf files, I used cuffmerge. But, the merged transcripts result has entirely different FPKM values for the same transcript.
chr1 Cufflinks transcript 29536 30065 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "1.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.700099"; full_read_support "yes";
I know cuffmerge reruns transcript reconstruction, but I'm confused as to how it got this kind of result. Has anyone tried speeding up cufflinks by running on individual chromosomes and if so, how did you merge the results?
Thanks for any help,
Dhivya
I'm running cufflinks on several large human samples on a cluster with time restrictions. In order to speed things up, I tried splitting the bam files by chromosome and running cufflinks on each chromosome separately. I'd like to find both novel and known transcripts, so for each cufflinks run, I also provided a full gtf file with all known transcripts.
Example cufflinks commands:
cufflinks -p 4 -g genes_ucsc.gtf -o b.1/cufflinks_out b.1/chr1.bam
cufflinks -p 4 -g genes_ucsc.gtf -o b.2/cufflinks_out b.2/chr2.bam
I'm having difficulties merging the results from each cufflinks run. A snippet of my cufflinks transcripts.gtf file from chr1:
chr1 Cufflinks transcript 29536 30065 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "3.6295414709"; frac "1.000000"; conf_lo "1.868955"; conf_hi "5.390128"; cov "7.520130"; full_read_support "yes";
chr10 Cufflinks transcript 92828 95178 1 - . gene_id "TUBB8"; transcript_id "NM_177987"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
As you can see, chr1's transcripts.gtf file has transcripts from all chromosomes (because my gtf file has known transcripts from all chromosomes), but only transcripts on chr1 have a non-zero FPKM value. To merge these transcripts.gtf files, I used cuffmerge. But, the merged transcripts result has entirely different FPKM values for the same transcript.
chr1 Cufflinks transcript 29536 30065 1000 . . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "1.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.700099"; full_read_support "yes";
I know cuffmerge reruns transcript reconstruction, but I'm confused as to how it got this kind of result. Has anyone tried speeding up cufflinks by running on individual chromosomes and if so, how did you merge the results?
Thanks for any help,
Dhivya