Hi, Sorry I'm relatively new to this and am just looking for some qualified input.
I have sequenced several metatranscriptomes using an Illumina GA system and I have assembled them using meta-velvet. Now I would like to do a reliable gene-prediction on the assembled contigs.
Basically I would use Orphelia or FragGeneScan for this, however from the respective descriptions I gather that they have been specifically designed for the Identification of coding regions in short single-reads not in assembled data. Whereas classical ORF-prediction tools like GLIMMER suppose that you have homogenic and more-or-less complete sequence data.
My data however is very heterogenic and my largest contigs are just over 2kb.
Therefore, what would be the best approach for ORF-prediction in assembled metatranscripome/metagenome-data?
Can I use Orphelia or FragGeneScan for this, or are they unreliable for datasets of highly varying sequence lengths? Do you know of any better suited tools?
EDIT:
I've tested FragGeneScan on my data and do get peptides of varying lengths, which is hopeful. But since its metatranscriptomic data, I can't really validate how much false positives i get or how much of the genetic potential is missed.
Any experience or opinions on wether I should optimize my data for such gene-predictions (For example to sort the contigs into size and so seperate predictions for each contig-sizerange (e.g. <100bp, <500bp and >500bp)?
I have sequenced several metatranscriptomes using an Illumina GA system and I have assembled them using meta-velvet. Now I would like to do a reliable gene-prediction on the assembled contigs.
Basically I would use Orphelia or FragGeneScan for this, however from the respective descriptions I gather that they have been specifically designed for the Identification of coding regions in short single-reads not in assembled data. Whereas classical ORF-prediction tools like GLIMMER suppose that you have homogenic and more-or-less complete sequence data.
My data however is very heterogenic and my largest contigs are just over 2kb.
Therefore, what would be the best approach for ORF-prediction in assembled metatranscripome/metagenome-data?
Can I use Orphelia or FragGeneScan for this, or are they unreliable for datasets of highly varying sequence lengths? Do you know of any better suited tools?
EDIT:
I've tested FragGeneScan on my data and do get peptides of varying lengths, which is hopeful. But since its metatranscriptomic data, I can't really validate how much false positives i get or how much of the genetic potential is missed.
Any experience or opinions on wether I should optimize my data for such gene-predictions (For example to sort the contigs into size and so seperate predictions for each contig-sizerange (e.g. <100bp, <500bp and >500bp)?
Comment