Dear All,
We have developed a pipeline to annotate coding and long non-coding RNAs in transcriptome datasets. The pipeline is unix based and requires a multi-FASTA file of transcripts (nucleotides) as input. The final output is a tab-delimited table which can be filtered further based on the anntoations of each transcript. Currently Anncoript searches each query sequence against Uniprot, Swiss-Prot, Conserved Domain Database and Rfam. Further it associates Uniprot IDs with GO terms and Enzyme IDs. Finally it estimates longest ORF size and coding potential to give a binary classficiation on a sequence being a potential long non coding RNA. You can find it at
https://github.com/frankMusacchia/Annocript
Below is the publication
http://bioinformatics.oxfordjournals...tv106.abstract
We have developed a pipeline to annotate coding and long non-coding RNAs in transcriptome datasets. The pipeline is unix based and requires a multi-FASTA file of transcripts (nucleotides) as input. The final output is a tab-delimited table which can be filtered further based on the anntoations of each transcript. Currently Anncoript searches each query sequence against Uniprot, Swiss-Prot, Conserved Domain Database and Rfam. Further it associates Uniprot IDs with GO terms and Enzyme IDs. Finally it estimates longest ORF size and coding potential to give a binary classficiation on a sequence being a potential long non coding RNA. You can find it at
https://github.com/frankMusacchia/Annocript
Below is the publication
http://bioinformatics.oxfordjournals...tv106.abstract