OK, long story short; after training and prediction with WebAugustus, with "User set UTR prediction: true", I don't see them (UTRs) clearly in the gtf - gff files. Instead, you see this output (gff file example of one gene):
So in this example, the gene "should" have an 3' UTR annotation, but there is no UTR on the 3rd column in the gene features.
Even more, if you go back at the output training files, and look for these cases (genes showing "# 5' - 3'UTR exons and introns: 1/1" in the prediction output), you see them with the UTR feature annotated (3'-UTR and/or 5'-UTR) in training/hints_utr_pred. Although same genomic coordinates, they have different gene number.
Since the forum in WebAugustus is for internal use only
http://bioinf.uni-greifswald.de/bioinf/forum
I am asking here.
So are those UTR predictions not robust enough? Or there is another extra step necessary to include them in the final annotation?
I'd appreciate any input on this issue. Thanks.
Code:
# start gene g1 Contig1 AUGUSTUS gene 1 870 0.81 - . g1 Contig1 AUGUSTUS transcript 1 870 0.81 - . g1.t1 Contig1 AUGUSTUS tts 1 1 . - . transcript_id "g1.t1"; gene_id "g1"; Contig1 AUGUSTUS exon 1 870 . - . transcript_id "g1.t1"; gene_id "g1"; Contig1 AUGUSTUS stop_codon 432 434 . - 0 transcript_id "g1.t1"; gene_id "g1"; Contig1 AUGUSTUS single 432 854 0.95 - 0 transcript_id "g1.t1"; gene_id "g1"; Contig1 AUGUSTUS CDS 432 854 0.95 - 0 transcript_id "g1.t1"; gene_id "g1"; Contig1 AUGUSTUS start_codon 852 854 . - 0 transcript_id "g1.t1"; gene_id "g1"; Contig1 AUGUSTUS tss 870 870 . - . transcript_id "g1.t1"; gene_id "g1"; # coding sequence = [] # protein sequence = [] # Evidence for and against this transcript: # % of transcript supported by hints (any source): 33.3 # CDS exons: 0/1 # CDS introns: 0/0 # 5'UTR exons and introns: 0/1 # 3'UTR exons and introns: 1/1 # E: 1 # hint groups fully obeyed: 1 # E: 1 (transcript49047) # incompatible hint groups: 0 # end gene g1
Even more, if you go back at the output training files, and look for these cases (genes showing "# 5' - 3'UTR exons and introns: 1/1" in the prediction output), you see them with the UTR feature annotated (3'-UTR and/or 5'-UTR) in training/hints_utr_pred. Although same genomic coordinates, they have different gene number.
Since the forum in WebAugustus is for internal use only
http://bioinf.uni-greifswald.de/bioinf/forum
I am asking here.
So are those UTR predictions not robust enough? Or there is another extra step necessary to include them in the final annotation?
I'd appreciate any input on this issue. Thanks.