Hi, I have a recently assembled bacterial genome. It consists of a a couple dozens scaffolds, and I have put them through three gene prediction methods: Prodigal, Glimmer, and GeneMark.
Now that I have three different gtf files describing putative genes, I want to somehow combine these results into a consensus list. I have been looking into such a combiner called JIGSAW and was wondering if anyone has experience using this software with bacteria (I think it was made with eukarylotes in mind).
I am running with the following command:
jigsaw -l -f "myGenome.fasta" -m "jigsaw.output" -e "myEvidenceFile"
And my evidence file looks like this:
scaffolds_GeneMark.gff gff geneprediction coding 1.0
scaffolds_Prodigal.gff gff geneprediction coding 1.0
scaffolds_Glimmer.gff gff geneprediction coding 1.0
Typically, JIGSAW wants the user to provide the type of exon that was annotated (start, internal, end, etc), but since this is a prokaryote, I thought maybe it was best just to use the "coding" identifier.
The problem is, for a particular contig that has an average of 10 genes predicted from each individual method (7 of which overlap perfectly between all three methods), JIGSAW is only predicting 2 genes!
Any comments or suggestions and much appreciated
Now that I have three different gtf files describing putative genes, I want to somehow combine these results into a consensus list. I have been looking into such a combiner called JIGSAW and was wondering if anyone has experience using this software with bacteria (I think it was made with eukarylotes in mind).
I am running with the following command:
jigsaw -l -f "myGenome.fasta" -m "jigsaw.output" -e "myEvidenceFile"
And my evidence file looks like this:
scaffolds_GeneMark.gff gff geneprediction coding 1.0
scaffolds_Prodigal.gff gff geneprediction coding 1.0
scaffolds_Glimmer.gff gff geneprediction coding 1.0
Typically, JIGSAW wants the user to provide the type of exon that was annotated (start, internal, end, etc), but since this is a prokaryote, I thought maybe it was best just to use the "coding" identifier.
The problem is, for a particular contig that has an average of 10 genes predicted from each individual method (7 of which overlap perfectly between all three methods), JIGSAW is only predicting 2 genes!
Any comments or suggestions and much appreciated
Comment