I'm working on generating a GTF version of current Flybase annotation for Drosophila, since this would include recent transcript models absent in Ensembl GTFs. I want to confirm it is stable with all the commonly used analysis tools, and make this available to the broader community, so I'd like to solicit opinions on the following:
- Other than including tss_id and p_id attributes, are there any aspects of a GTF annotation that you would like to see, or pitfalls to watch for? (Such as antisense transcript models that cause GTF parsing errors)
- Do you have any suggestions about handling polycistronic genes (where transcripts have identical coordinates but non-overlapping CDS, and hence different gene_ids), to avoid confusion when reporting gene-level results?
- Are there any tools other than Cufflinks/Tophat that you would like to ensure the GTF works correctly with?
Thanks for any feedback!
Dave
- Other than including tss_id and p_id attributes, are there any aspects of a GTF annotation that you would like to see, or pitfalls to watch for? (Such as antisense transcript models that cause GTF parsing errors)
- Do you have any suggestions about handling polycistronic genes (where transcripts have identical coordinates but non-overlapping CDS, and hence different gene_ids), to avoid confusion when reporting gene-level results?
- Are there any tools other than Cufflinks/Tophat that you would like to ensure the GTF works correctly with?
Thanks for any feedback!
Dave
Comment