Hi all-
First off, thank you Cole for Cufflinks. It will be very useful for the type of RPKM analysis I'd like to.
I'm having a bit of trouble with the GTF output files when I plug them into cuffcompare. I run cufflinks using a UCSC GTF file for the mouse genome as reference:
$./cufflinks -G mm9.KnownGene.GTF accepted_hits.sam
This works fine and outputs the appropriate files. However, when I plug 2 output GTFs into CuffCompare
$./cuffcompare -r mm9_KnownGene.GTF -V -o stats.txt Sample1.gtf Sample2.GTF
I get the following error:
Loading reference transcripts..
64 duplicate reference transcripts discarded.
..ref data loaded
Processing file: Sample1.gtf
Loading transcripts from Sample1.gtf..
Error: duplicate GFF ID 'uc007aji.1' encountered!
Taking a look at the Sample1.gtf, it does in fact look like there are duplicate entries. Example:
chr1 Cufflinks transcript 15795739 15833510 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15795739 15796038 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "1"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15797722 15797817 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "2"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15803122 15803243 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "3"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15809015 15809164 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "4"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15809732 15809844 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "5"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15821588 15821647 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "6"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15823482 15823570 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "7"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15828263 15828369 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "8"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15833002 15833510 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "9"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks transcript 15795739 15833510 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15795739 15796038 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "1"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15797722 15797817 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "2"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15803122 15803243 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "3"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15809015 15809164 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "4"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15809732 15809844 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "5"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15821588 15821647 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "6"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15823482 15823570 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "7"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15828263 15828369 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "8"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15833002 15833510 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "9"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
This is not a unique occurrence and seems to appear throughout the GTF file.
It seems as though these 2 transcripts are identical in every way...why is Cufflinks outputting them twice?
Any assistance would be much appreciated.
Thanks!
First off, thank you Cole for Cufflinks. It will be very useful for the type of RPKM analysis I'd like to.
I'm having a bit of trouble with the GTF output files when I plug them into cuffcompare. I run cufflinks using a UCSC GTF file for the mouse genome as reference:
$./cufflinks -G mm9.KnownGene.GTF accepted_hits.sam
This works fine and outputs the appropriate files. However, when I plug 2 output GTFs into CuffCompare
$./cuffcompare -r mm9_KnownGene.GTF -V -o stats.txt Sample1.gtf Sample2.GTF
I get the following error:
Loading reference transcripts..
64 duplicate reference transcripts discarded.
..ref data loaded
Processing file: Sample1.gtf
Loading transcripts from Sample1.gtf..
Error: duplicate GFF ID 'uc007aji.1' encountered!
Taking a look at the Sample1.gtf, it does in fact look like there are duplicate entries. Example:
chr1 Cufflinks transcript 15795739 15833510 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15795739 15796038 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "1"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15797722 15797817 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "2"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15803122 15803243 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "3"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15809015 15809164 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "4"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15809732 15809844 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "5"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15821588 15821647 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "6"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15823482 15823570 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "7"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15828263 15828369 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "8"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks exon 15833002 15833510 1000 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "9"; RPKM "1.2249777386"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "1.781371";
chr1 Cufflinks transcript 15795739 15833510 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15795739 15796038 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "1"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15797722 15797817 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "2"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15803122 15803243 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "3"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15809015 15809164 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "4"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15809732 15809844 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "5"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15821588 15821647 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "6"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15823482 15823570 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "7"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15828263 15828369 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "8"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
chr1 Cufflinks exon 15833002 15833510 154 + . gene_id "uc007aji.1"; transcript_id "uc007aji.1"; exon_number "9"; RPKM "0.1240106434"; frac "0.116576"; conf_lo "0.999925"; conf_hi "1.369358"; cov "0.180337";
This is not a unique occurrence and seems to appear throughout the GTF file.
It seems as though these 2 transcripts are identical in every way...why is Cufflinks outputting them twice?
Any assistance would be much appreciated.
Thanks!
Comment