Hi all,
We sequenced a non model organism transcriptome and we used Trintiy for de novo assembly. Trinity generated 182,171 contigs.
Next, we annotated Trinity contigs using blastx against the swissprot dataset.
We got 145,494 with no hits found and 36,677 with hits found.
I assumed that the low hits number is due to the fact that its swissprot and not TrEMBL - and we r still waiting for TrEMBL results.
Right now we have for each of the annotated contig record like this:
TrcontigID O60268 K0513_HUMAN
The first is just trinity contig id.
The second is the uniprot accession number.
The third is the uniprot name.
The next step that we did is to take the uniprot accession number as input to David (http://david.abcc.ncifcrf.gov/home.jsp) for functional analysis.
The input of David can be only from single organism each time, so we used HUMAN.
The problem is that we have 85% of the data that annotated not for HUMAN.
and 15% is not represent all the RNA population in the transcriptome.
We are still interesting in functional analysis based on uniprot accession number/name, but we want to have a full picture to our transcriptome - not just per organism.
I guess there is a better way to do what we did.
Any ideas how to do that?
Thanks,
Pap
For the annotated contigs we used David for functional annotation clustring.
David
We sequenced a non model organism transcriptome and we used Trintiy for de novo assembly. Trinity generated 182,171 contigs.
Next, we annotated Trinity contigs using blastx against the swissprot dataset.
We got 145,494 with no hits found and 36,677 with hits found.
I assumed that the low hits number is due to the fact that its swissprot and not TrEMBL - and we r still waiting for TrEMBL results.
Right now we have for each of the annotated contig record like this:
TrcontigID O60268 K0513_HUMAN
The first is just trinity contig id.
The second is the uniprot accession number.
The third is the uniprot name.
The next step that we did is to take the uniprot accession number as input to David (http://david.abcc.ncifcrf.gov/home.jsp) for functional analysis.
The input of David can be only from single organism each time, so we used HUMAN.
The problem is that we have 85% of the data that annotated not for HUMAN.
and 15% is not represent all the RNA population in the transcriptome.
We are still interesting in functional analysis based on uniprot accession number/name, but we want to have a full picture to our transcriptome - not just per organism.
I guess there is a better way to do what we did.
Any ideas how to do that?
Thanks,
Pap
For the annotated contigs we used David for functional annotation clustring.
David
Comment