I have been trying to supply a GTF for annotation with Cufflinks/Cuffcompare and I have been having no success at all.
I started by only having GFF files. The organism I work with, Arabidopsis, does not have any published GTF annotation files that I have been able to locate and I saw someone else on here was unable to locate any as well. So I attempted to convert the GFFs I had into GTFs by converting the ninth column. I used http://mblab.wustl.edu/GTF22.html as my reference.
On the first try I simply took the feature column and made it the gene_id and the transcript_id, knowing the names would be nice, but for our purposes just knowing what the reads represent is sufficient (mRNA, miRNA, siRNA, pseudogene, etc.)
This resulted in an error in Cuffcompare:
Based on the error results I reformatted my GFF>GTF conversion file by simply numbering each of the gene_id and transcript_id in a unique way to remove any redundancy in the file:
Result:
I investigate the error, but was really unable to find anything so I figured that maybe the way I set up the transcript_id was throwing an error (*****-N) so I altered the GTF again. "-" > "1"
Result:
I have no idea what the "GList error (GList.hh:592):Invalid list index: -1" error means or how to correct it.
Can anyone make a recommendation on changing a GFF into a GTF? Tophat was able to supply GFF files for annotation, but for some reason Cufflinks only allows GTF files to provide annotation. It's great for some of the more mainstream organisms, but a lot of them (Arabidopsis in my case) only have annotations in GFF and GFF3 which creates a wall in being able to process the expression data.
Any and all help/suggestions would be greatly appreciated. I've been hung on up this problem for some time now and I have no more ideas on how to proceed.
Thanks as always.
I started by only having GFF files. The organism I work with, Arabidopsis, does not have any published GTF annotation files that I have been able to locate and I saw someone else on here was unable to locate any as well. So I attempted to convert the GFFs I had into GTFs by converting the ninth column. I used http://mblab.wustl.edu/GTF22.html as my reference.
On the first try I simply took the feature column and made it the gene_id and the transcript_id, knowing the names would be nice, but for our purposes just knowing what the reads represent is sufficient (mRNA, miRNA, siRNA, pseudogene, etc.)
Code:
Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene"; transcript_id "gene"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA"; transcript_id "mRNA"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein"; transcript_id "protein";
Code:
cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. Error: duplicate GFF ID 'mRNA' encountered!
Code:
Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene2"; transcript_id "gene-2"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA3"; transcript_id "mRNA-3"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein4"; transcript_id "protein-4";
Code:
cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. GList error (GList.hh:592):Invalid list index: -1
Code:
Chr1 TAIR9 gene 3631 5899 . + . gene_id "gene2"; transcript_id "gene12"; Chr1 TAIR9 mRNA 3631 5899 . + . gene_id "mRNA3"; transcript_id "mRNA13"; Chr1 TAIR9 protein 3760 5630 . + . gene_id "protein4"; transcript_id "protein14";
Code:
cuffcompare -r *.gtf -R -V -o 162.162E -p 4 transcripts1.gtf transcripts2.gtf Loading reference transcripts.. GList error (GList.hh:592):Invalid list index: -1
Can anyone make a recommendation on changing a GFF into a GTF? Tophat was able to supply GFF files for annotation, but for some reason Cufflinks only allows GTF files to provide annotation. It's great for some of the more mainstream organisms, but a lot of them (Arabidopsis in my case) only have annotations in GFF and GFF3 which creates a wall in being able to process the expression data.
Any and all help/suggestions would be greatly appreciated. I've been hung on up this problem for some time now and I have no more ideas on how to proceed.
Thanks as always.
Comment