Unconfigured Ad

**schmima** · 03-30-2011, 01:25 AM

To annotate an assembly, http://www.blast2go.org/ may help you.

**Rachel** · 03-30-2011, 06:11 PM

Thanks and appreciate for your reply.

If I am not mistaken the blast2go are able to annotate the available genes from the database. If the genes or hypothetical proteins is not available in the database. Then what should I need to do to predict a new or novel gene? Thanks

**schmima** · 03-30-2011, 09:41 PM

just to make sure that I got it right - you have an assembled transcriptome and you want to annotate it (?). I guess that for this you will always have to rely on other databases. I don't know about anything that would be able to tell you ab initio what kind of sequence would produce what kind of protein.

In other words: you have to rely on existing knowledge. However - there's quite a lot around. Example blast2go does more or less the following:
1. uses blast (in case of transcripts blastx) to search for similar transcripts which are at least somewhere somehow described (some may have experimental evidence, other are only based on predictions). In this step you will not only find the ones that are identical to known transcripts. It will also find cases where you have some similarity.
2. Annotation then via GO, InterProScan, KEGG etc. (InterPro runs - I think - only on the ones which have a GO annotation - did not finish it due to the rather slow processing

)
3. Some Statistics

Using blast2go you will be able to annotate quite some of your transcripts. Nonetheless - you will definitely have others which are not similar to any of the known ones (to be exact - they may be similar to a certain extent - but less than you specified by the threshold you chose for blastx).

Now - if I got it right, you would like to do something with the remaining - unannotated transcript (?). Hm - I'm not really an expert for this. But I guess that "gene prediction" is not really what you need (as this programs are rather annotating a genome sequence - with the help of the transcripts you provide from your assembly - but as you don't have a genome sequence...). Well - there may be some programs which check transcripts directly - would be nice to know if you find something.

An other possibility would be to search for protein domains (InterProScan etc - but this time on the sequences which were left out by blast2go). However - as fas as I know, you need to have protein sequences to do so. Means you need to translate your transcript into proteins (if not strand specific: six proteins - three frames from each strand). Just keep in mind:
1. the domainscanners are again based on "similarity to known things"
2. translating transcripts into proteins can be quite errorprone (imagine you had some intronic reads (eg either unspliced pre-mRNA or antisense transcripts): they will be incorporated into your transcript and during in silico translation it will mix up your protein sequence quite badly)

In summary:
I don't know about a "good" way of dealing with unknown transcripts which are not similar to anything that is known [well there are some - but not on the computer

you would have to go to the bench

]

**Rachel** · 03-30-2011, 11:03 PM

Hi

Really appreciate for detailed out my questions ^_^ That is exactly what I want to know > how to deal with the unknown transcripts.

Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...

Share with me if there is any additional info ^_^ Have a nice day ahead ya

**schmima** · 03-30-2011, 11:45 PM

was a pleasure

Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...

I guess if it is totally different you'll have a hard time. Well - in principle you could translate into protein and do some crazy stuff

maybe via the structure... but I think this is everything else than easy...

well - if you just have few of them (or could filter based on whatever criteria down to few):
1. back to the lab

try to get/confirm the transcript (means: clone and sequence it the old way)
2. still in the lab - use other methods to characterize it...
3. some years later: either

,

or

...

have a nice day - and in case you found a solution, let me know

all the best

**Rachel** · 03-30-2011, 11:58 PM

WOW seems to be very challenging and a lot of stuff to be done if that happens!!!
Will see what else I can do with it....

Anyway, much appreciate for the sharing... THANKS!

**eskirton** · 04-04-2011, 03:14 PM

try hmmscan vs pfam

maybe try a blast-based annotation first (as recommended above) and with your remaining (and low-confidence) transcripts, try a more sensitive hmm based annotation.

first identify the coding regions and translate (e.g. using prodigal or similar), and run hmmscan vs pfam. novel proteins will likely have conserved domains, so even if they don't have "full-length" hits to known proteins, the domains themselves are informative.

**schmima** · 04-04-2011, 08:47 PM

By the way - beside the protein-similarity searches via blastx and domainscanners (forgot to note that blast2go is only trying to annotate protein coding transcripts - as GOs are only associated with proteins) I would also search for similarities on the nucleotide level (normal blast/blat - don't know about any software that is wrapping everything - if anyone knows - would be interesting) - I believe you will be able to annotate some of the ones that were not having any protein(-domain) similarity (some of them could also be rather intersting in biological meaning).

All the best (writing at the phone is tricky - sry for mistakes...)

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

RNA-seq assembly

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News