In cufflinks output gtf files, the normal format should look like below, exon labeled 1, 2, 3 according to it locations:
AC_000165.1 Cufflinks transcript 73406315 73407516 1 + . gene_id "gene11858"; transcript_id "rna22796"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
AC_000165.1 Cufflinks exon 73406315 73406335 1 + . gene_id "gene11858"; transcript_id "rna22796"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
AC_000165.1 Cufflinks exon 73406967 73407067 1 + . gene_id "gene11858"; transcript_id "rna22796"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
AC_000165.1 Cufflinks exon 73407236 73407516 1 + . gene_id "gene11858"; transcript_id "rna22796"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
But in my gtf file, there are some werid transcripts:
AC_000165.1 Cufflinks transcript 73092211 73092802 1 - . gene_id ""; transcript_id "id234068"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
AC_000165.1 Cufflinks exon 73092211 73092802 1 - . gene_id ""; transcript_id "id234068"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
AC_000165.1 Cufflinks transcript 73087432 73087947 1000 - . gene_id "CUFF.5202"; transcript_id "id234068"; FPKM "11.6897553246"; frac "0.734836"; conf_lo "11.689755"; conf_hi "11.689755"; cov "4.857789"; full_read_support "yes";
AC_000165.1 Cufflinks exon 73087432 73087947 1000 - . gene_id "CUFF.5202"; transcript_id "id234068"; exon_number "1"; FPKM "11.6897553246"; frac "0.734836"; conf_lo "11.689755"; conf_hi "11.689755"; cov "4.857789";
AC_000165.1 Cufflinks transcript 73088079 73088393 428 - . gene_id "CUFF.5202"; transcript_id "id234068"; FPKM "5.0075421533"; frac "0.115873"; conf_lo "5.007542"; conf_hi "5.007542"; cov "2.080932"; full_read_support "yes";
AC_000165.1 Cufflinks exon 73088079 73088393 428 - . gene_id "CUFF.5202"; transcript_id "id234068"; exon_number "1"; FPKM "5.0075421533"; frac "0.115873"; conf_lo "5.007542"; conf_hi "5.007542"; cov "2.080932";
In this transcript_id, all exon number are labeled as 1, which causing Cuffmerge failed: GFF Error: duplicate/invalid 'transcript' feature ID=id234068
I'm new in RNAseq, and I really need help! So I'm wondering have you guys meet the same problem? I use default options in cufflinks, is that causing the problem?
Any thoughts will be appreciated!
Thanks a lot!
Best,
Ellie
AC_000165.1 Cufflinks transcript 73406315 73407516 1 + . gene_id "gene11858"; transcript_id "rna22796"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
AC_000165.1 Cufflinks exon 73406315 73406335 1 + . gene_id "gene11858"; transcript_id "rna22796"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
AC_000165.1 Cufflinks exon 73406967 73407067 1 + . gene_id "gene11858"; transcript_id "rna22796"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
AC_000165.1 Cufflinks exon 73407236 73407516 1 + . gene_id "gene11858"; transcript_id "rna22796"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
But in my gtf file, there are some werid transcripts:
AC_000165.1 Cufflinks transcript 73092211 73092802 1 - . gene_id ""; transcript_id "id234068"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
AC_000165.1 Cufflinks exon 73092211 73092802 1 - . gene_id ""; transcript_id "id234068"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
AC_000165.1 Cufflinks transcript 73087432 73087947 1000 - . gene_id "CUFF.5202"; transcript_id "id234068"; FPKM "11.6897553246"; frac "0.734836"; conf_lo "11.689755"; conf_hi "11.689755"; cov "4.857789"; full_read_support "yes";
AC_000165.1 Cufflinks exon 73087432 73087947 1000 - . gene_id "CUFF.5202"; transcript_id "id234068"; exon_number "1"; FPKM "11.6897553246"; frac "0.734836"; conf_lo "11.689755"; conf_hi "11.689755"; cov "4.857789";
AC_000165.1 Cufflinks transcript 73088079 73088393 428 - . gene_id "CUFF.5202"; transcript_id "id234068"; FPKM "5.0075421533"; frac "0.115873"; conf_lo "5.007542"; conf_hi "5.007542"; cov "2.080932"; full_read_support "yes";
AC_000165.1 Cufflinks exon 73088079 73088393 428 - . gene_id "CUFF.5202"; transcript_id "id234068"; exon_number "1"; FPKM "5.0075421533"; frac "0.115873"; conf_lo "5.007542"; conf_hi "5.007542"; cov "2.080932";
In this transcript_id, all exon number are labeled as 1, which causing Cuffmerge failed: GFF Error: duplicate/invalid 'transcript' feature ID=id234068
I'm new in RNAseq, and I really need help! So I'm wondering have you guys meet the same problem? I use default options in cufflinks, is that causing the problem?
Any thoughts will be appreciated!
Thanks a lot!
Best,
Ellie