Hi all,
I have done a comparison of blastn as implemented in BLAST+ 2.2.25 (latest version) and BLASTALL (legacy BLAST) and observed non-trivial discrepancies in the results. In summary, BLAST+ gives more hits of the query to the subject/database, with lower (better) E-values. BLAST+ also often generates best hits (often with better E-values) to different sequences in the subject/database, compared to BLASTALL.
Of concern also is the fact that the default and .tsv outputs obtained with BLASTALL, blastn also show differences in the number of hits. No such discrepancy was seen in the BLAST+ results, regardless of which output format was specified.
If anyone has had similar observations or feedback on my observations/analyses, they would be much appreciated.
Below are examples of equivalent command lines I used for the different BLAST versions.
The command lines used for BLAST+, blastn were:
blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 –out output.blastn (default output)
blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 -outfmt 10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score –out output.blastn.csv (csv output)
The command lines used for BLASTALL, blastn were:
blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -o output.blastn -a 8 (default output)
blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -m 8 –o output.blastn.tsv -a 2 (tsv output)
Thanks!
I have done a comparison of blastn as implemented in BLAST+ 2.2.25 (latest version) and BLASTALL (legacy BLAST) and observed non-trivial discrepancies in the results. In summary, BLAST+ gives more hits of the query to the subject/database, with lower (better) E-values. BLAST+ also often generates best hits (often with better E-values) to different sequences in the subject/database, compared to BLASTALL.
Of concern also is the fact that the default and .tsv outputs obtained with BLASTALL, blastn also show differences in the number of hits. No such discrepancy was seen in the BLAST+ results, regardless of which output format was specified.
If anyone has had similar observations or feedback on my observations/analyses, they would be much appreciated.
Below are examples of equivalent command lines I used for the different BLAST versions.
The command lines used for BLAST+, blastn were:
blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 –out output.blastn (default output)
blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 -outfmt 10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score –out output.blastn.csv (csv output)
The command lines used for BLASTALL, blastn were:
blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -o output.blastn -a 8 (default output)
blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -m 8 –o output.blastn.tsv -a 2 (tsv output)
Thanks!
Comment