I am working with non-model organism for which we have assembled draft genome, and annotated genes, in GFF3 format. I am trying to use RSEM to estimate gene/isoforms expression levels. Since RSEM does not support GFF3 format I converted GFF3 into GTF file. It looks to me that conversion worked fine, and here are the first few lines of converted GTF file:
Scaffold1 I5K EXON 8888 9187 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 20235 20765 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 20965 21120 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 22857 22885 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 23931 24107 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 24723 24892 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
However my GTF is not correctly "understood" by RSEM, because rsem-prepare-reference doesn’t extract transcript reference sequences from sequence of the genome.
When I run the ollowing command line:
rsem-prepare-reference --gtf /EAFFGENOME/EAFF.Models_2.gtf --bowtie --bowtie-path bowtie-1.0.1 /EAFFGENOME/Eaff_11172013.genome2.fa affinis1_refs
I’ve got the following error:
Parsed 200000 lines
The reference contains no transcripts!
Any help or suggestions would be appreciated very much.
PM
Scaffold1 I5K EXON 8888 9187 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 20235 20765 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 20965 21120 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 22857 22885 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 23931 24107 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
Scaffold1 I5K EXON 24723 24892 . + 0 gene_id "EAFF000001"; transcript_id "EAFF000001-RA";
However my GTF is not correctly "understood" by RSEM, because rsem-prepare-reference doesn’t extract transcript reference sequences from sequence of the genome.
When I run the ollowing command line:
rsem-prepare-reference --gtf /EAFFGENOME/EAFF.Models_2.gtf --bowtie --bowtie-path bowtie-1.0.1 /EAFFGENOME/Eaff_11172013.genome2.fa affinis1_refs
I’ve got the following error:
Parsed 200000 lines
The reference contains no transcripts!
Any help or suggestions would be appreciated very much.
PM
Comment