Dear colleagues,
I recently asked a laboratory how they extended annotations on a transcriptome for a non-model organism; they pointed me towards their github.
Now this assembly/annotation pipeline in github was what i used to originally assemble and annotate a closely related species transcriptome. In the past this pipeline used blastx to query the uniref50 database. Now this laboratory is querying UniprotKB.
I just defended my thesis and one of the criticisms from my committee was the poor annotation of my assembly. So of course I wanted to try blasting against this other database (if that was what improved this labs assembly).
Imagine my surprise when the UniprotKB resulted in worse annotation than uniref50!
Not knowing much of anything about criteria when selecting a database to BLAST I did a bit of reading. According to Suzek et al., 2007 uniref is a clustered sequences from Uniprot that hides redundant sequences; this results in a size reduction of database your blasting against which increases the speed of similarity search. From what I understand it also "improves detection of distant relationships".
So my understanding is that I am getting better results from Uniref50 because sequences need at the very least 50% sequence identity. Can anyone correct me if I'm wrong.
THE SECOND QUESTION
What would you suggest to improve functional annotation? Obviously increasing the sequencing depth of coverage would be one suggestion but in my case is no longer possible. Given what I have currently what can be done? Is there another database you would suggest blasting against?
I recently asked a laboratory how they extended annotations on a transcriptome for a non-model organism; they pointed me towards their github.
Now this assembly/annotation pipeline in github was what i used to originally assemble and annotate a closely related species transcriptome. In the past this pipeline used blastx to query the uniref50 database. Now this laboratory is querying UniprotKB.
I just defended my thesis and one of the criticisms from my committee was the poor annotation of my assembly. So of course I wanted to try blasting against this other database (if that was what improved this labs assembly).
Imagine my surprise when the UniprotKB resulted in worse annotation than uniref50!
Not knowing much of anything about criteria when selecting a database to BLAST I did a bit of reading. According to Suzek et al., 2007 uniref is a clustered sequences from Uniprot that hides redundant sequences; this results in a size reduction of database your blasting against which increases the speed of similarity search. From what I understand it also "improves detection of distant relationships".
So my understanding is that I am getting better results from Uniref50 because sequences need at the very least 50% sequence identity. Can anyone correct me if I'm wrong.
THE SECOND QUESTION
What would you suggest to improve functional annotation? Obviously increasing the sequencing depth of coverage would be one suggestion but in my case is no longer possible. Given what I have currently what can be done? Is there another database you would suggest blasting against?
Comment