Unconfigured Ad

**rhinoceros** · 12-11-2013, 11:21 AM

The E-value is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size.

Glossary - The NCBI Handbook - NCBI Bookshelf

http://www.ncbi.nlm.nih.gov/books/NBK21106/

From biostars:

I don't know the answer yet, but for one, the -dbsize parameter in the blast command line is supposed to be the cumulative length of all the sequences in the database, rather than the number of sequences in the database.

**fireLog2** · 12-12-2013, 12:45 AM

Thank you rhinoceros for helping.

But I already read that post and tried the cumulative length. In my example they are 13581 aminoacids in all 33 sequences. And with try and error I found that the dbsize must be around 11809 to get the same e-values.

It should be working somehow. I read it here:

[...]
-z db_size is the number of proteins in the set (see “Incrementally add a genome” below)
[...]
Incrementally add a genome:
Once a large all-versus-all BLAST has been completed, you may need to “incrementally” add a new proteome, without re-running the large all-versus-all BLAST. To do so:
(1) Prepare the new proteome’s FASTA file as you did for the previous ones (Steps 4 and 5);
(2) Make a new BLAST database that includes all the previous proteins plus the new FASTA file.;
(3) Use the -z argument of BLAST to simulate the size of the all database, so that the statistics and scoring is compatible with the original all-v-all BLAST. Use the same -z value as was used in the original BLAST.
[...]

Source: Fischer, S. (2011), Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr Protoc Bioinformatics (https://www.ncbi.nlm.nih.gov/pmc/art...report=classic)

**rhinoceros** · 12-12-2013, 02:02 AM

Make a db with 33 sequences
Make another db with 32 sequences
Use the same db_size parameter, e.g. 10000 for blasts against both of these dbs. Are the evalues of the hits the same?

**fireLog2** · 12-12-2013, 05:32 AM

Thanks for the idea. Here is the result:
If I use the same value for db_size for different blast-db, I get same evalues.

Now is the problem partially solved.

Next time, I will set at the beginning a fix value for the db-size, if I will extend it later.

But how is the db size defined, if you want to know it afterwards?

**dsenalik** · 01-26-2015, 01:24 PM

I found this year-old thread quite helpful. I did my own experiments, 22 sequences of 7723 amino acids.
I counted amino acids this way:

Code:

cat goodProteins.fasta | grep -v '>' | tr -d "\n" | wc -c

Setting -dbsize 7723 gave the exact same e-values as when -dbsize was not specified.
Setting -dbsize 7713 gave 2 different e-values
Setting -dbsize 7600 gave 33 different e-values
So for me, -dbsize was the total number of amino acids.

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Yesterday, 08:59 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 Yesterday, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

BLASTp parameter -dbsize problems (blastall -z)

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News