Hi everyone,
TL;DR: Always use the -max_target_seqs flag when using BLAST unless you are using default verbose output, otherwise you might not get all the hits you should.
Example:
So you get different numbers of results using default textual output and csv format. I was able to 'rescue' it by adding the -max_target_seqs flag:
I confirmed this on the blast-help email helpdesk at NCBI. Their response:
This is true whether or not you use the -perc_identity flag as I did.
Hope that helps someone, and does affect too many different pieces of software and science..
ben
--
Tyson Laboratory, Australian Centre for Ecogenomics
TL;DR: Always use the -max_target_seqs flag when using BLAST unless you are using default verbose output, otherwise you might not get all the hits you should.
Example:
Code:
$ blastn -perc_identity 97 -query my_query.fa -db nt -out result.txt $ blastn -perc_identity 97 -query my_query.fa -db nt -outfmt 6 -out result.csv $ grep '>' result.txt |wc -l 186 $ wc -l result.csv 143 result.csv
Code:
$ blastn -perc_identity 97 -max_target_seqs 500 -query my_query.fa -db nt -outfmt 6 -out result.csv2 $ wc -l result.csv2 186 result.csv2
I confirmed this on the blast-help email helpdesk at NCBI. Their response:
For output formats >4, -max_target_seqs should be explicitly set. In the next release, 2.2.27, you should get a commandline message to that effect
Hope that helps someone, and does affect too many different pieces of software and science..
ben
--
Tyson Laboratory, Australian Centre for Ecogenomics
Comment