Hello all,
I'm stuck at a blastp problem. I want to use the parameter -dbsize to incrementally add sequences to a blast output. Same option is available in blastall with the parameter -z.
I did several tests before I wrote this posting. I'm trying to use this parameter. But I don't get the right e-values.
Here an example:
- I have a FASTA file with 33 sequences: original_seqs.fasta
- I create a sequence database: makeblastdb -in original_seqs.fasta -dbtype prot -out blast_db
- output: 33 sequences added
- first use of dbsize: blastp -db blast_db -query original_seqs.fasta -outfmt 6 -out all-vs-all_dbSize_10000.tsv -evalue 1e-5 -dbsize 10000
- next use of dbsize: blastp -db blast_db -query original_seqs.fasta -outfmt 6 -out all-vs-all_dbSize_33.tsv -evalue 1e-5 -dbsize 33
2. Question:
In all three outputs is a different e-value for the same compared sequences. It's clear that the output for dbsize=10000 is different. But why for dbsize=33 (as in the original by default)? Or do I have a false understanding of this blast parameter?
Later I want to include new sequences. But the evalue should be comparable.
I'm stuck at a blastp problem. I want to use the parameter -dbsize to incrementally add sequences to a blast output. Same option is available in blastall with the parameter -z.
I did several tests before I wrote this posting. I'm trying to use this parameter. But I don't get the right e-values.
Here an example:
- I have a FASTA file with 33 sequences: original_seqs.fasta
- I create a sequence database: makeblastdb -in original_seqs.fasta -dbtype prot -out blast_db
- output: 33 sequences added
1. Question: Does this means the database size is: 33? I think so.
- blast the sequences against each other: blastp -db blast_db -query original_seqs.fasta -outfmt 6 -out all-vs-all_dbSize_default.tsv -evalue 1e-5- first use of dbsize: blastp -db blast_db -query original_seqs.fasta -outfmt 6 -out all-vs-all_dbSize_10000.tsv -evalue 1e-5 -dbsize 10000
- next use of dbsize: blastp -db blast_db -query original_seqs.fasta -outfmt 6 -out all-vs-all_dbSize_33.tsv -evalue 1e-5 -dbsize 33
2. Question:
In all three outputs is a different e-value for the same compared sequences. It's clear that the output for dbsize=10000 is different. But why for dbsize=33 (as in the original by default)? Or do I have a false understanding of this blast parameter?
Later I want to include new sequences. But the evalue should be comparable.
Comment