you wonder then, why the program isn't being doing this splitting by itself
Unconfigured Ad
Collapse
X
-
There are 4 main steps in blastn.
1.Prepare the hash table with mask data.
2.Scan the hits in the database. And the -thread_num command only useful in this step.
3.Trace back the result in the database.
4.Print the result.
-thread_num command (multi-thread version in step 2) is better than multi-progress. Multi-progress will load database, mask database into RAM by each progress.
Our G-Blastn which speed up the scan step in GPU and speed up the trace back step by SSE, change the framework into pipeline, each step can be overlapped.
You can find the source code and release 1.0 on
and
Download GBLASTN for free. G-BLASTN is a GPU-accelerated nucleotide alignment tool. G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST. G-BLASTN can produce exactly the same results as NCBI-BLAST, and it also has very similar user commands.
Comment
-
-
I had a speed up problem with blast+, too. I analysed the cpu usage for mulitthreading option in the "old" blast and blast+ and I saw that the multithreading option was not efficient for my dataset. So I parallized it with a perl script so increase the speed. Maybe thats an option to speed up blast+ runs for you.
The original reply from the blast team was:
Maybe this helps... If someone is interested in the script, just ask me....The overall total CPU time was about 160 minutes for both runs, but the blastall application did finish in less time than the BLAST+ application. We will work on improving the the parallelization of BLAST+ for this case. I've also looked at some test cases against our nt database, but blastall and the BLAST+ application did equally well on the parallelization.
Comment
-
-
It's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
Code:#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Parallel::ForkManager; use Time::Local; ############################# #USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10 ############################# # author: Ulrike Loeber ([email protected]) #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+; my $blast_method = $ARGV[0]; my $seq_dir = $ARGV[1]; my $evalue = $ARGV[2]; my $outfmt = $ARGV[3]; my $outdir = $ARGV[4]; my $cpus = $ARGV[5]; opendir (INDIR, $seq_dir) or die $!; my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta" close INDIR; #one cpu is used for perl, so the number of cpus left is $seq_dir-1 my $numberOfProcesses=($cpus-1); my $subsets=$numberOfProcesses; #build as many subsets as free cpus my $manager = new Parallel::ForkManager( $numberOfProcesses ); foreach my $file(@files){ my $time=localtime(); print "#########PROCESSING##########\n $file\t $time\n"; my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file", -format => "fasta"); my $seq; my @seq_array; while( $seq = $input->next_seq() ) { push(@seq_array,$seq); } my $numberofsequences=@seq_array; system "makeblastdb -dbtype nucl -in $seq_dir/$file"; my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences $seq=$seq_array[$i]; my $id=$seq->id(); my $sequence=$seq->seq(); print OUTFILE ">$id\n$sequence\n"; } close OUTFILE; $manager->start and next; system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast "; $manager->finish; } print "#########END##########\n $file\t $time\n"; } #cleaning up directory $manager->wait_all_children; foreach my $file(@files){ my $outfile=$file; $outfile=~s/fasta/blast/g; system "touch $outdir/$outfile"; #creates one outfile per fasta file for (my $j=0;$j<$subsets;$j++){ system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results system "rm $outdir/subset_$j\_$file"; #removes data subsets } system "rm $outdir/$file.nhr"; system "rm $outdir/$file.nin"; system "rm $outdir/$file.nsq"; #print "#########COMPLETE##########\n $file\t $time\n"; }
Comment
-
-
I am using following script to speed up my query at tblastn, hence it is showing following error...
Can't exec "makeblastdb": No such file or directory at blast.pl line 42, <GEN0> line 42132.
#############################
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Parallel::ForkManager;
use Time::Local;
#############################
#USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus
#e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10
#############################
# author: Ulrike Loeber ([email protected])
#This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+;
my $blast_method = $ARGV[0];
my $seq_dir = $ARGV[1];
my $evalue = $ARGV[2];
my $outfmt = $ARGV[3];
my $outdir = $ARGV[4];
my $cpus = $ARGV[5];
opendir (INDIR, $seq_dir) or die $!;
my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta"
close INDIR;
#one cpu is used for perl, so the number of cpus left is $seq_dir-1
my $numberOfProcesses=($cpus-1);
my $subsets=$numberOfProcesses; #build as many subsets as free cpus
my $manager = new Parallel::ForkManager( $numberOfProcesses );
foreach my $file(@files){
my $time=localtime();
print "#########PROCESSING##########\n $file\t $time\n";
my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file",
-format => "fasta");
my $seq;
my @seq_array;
while( $seq = $input->next_seq() ) {
push(@seq_array,$seq);
}
my $numberofsequences=@seq_array;
system "makeblastdb -dbtype nucl -in $seq_dir/$file";
my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences
for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x
open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it
for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences
$seq=$seq_array[$i];
my $id=$seq->id();
my $sequence=$seq->seq();
print OUTFILE ">$id\n$sequence\n";
}
close OUTFILE;
$manager->start and next;
system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast ";
$manager->finish;
}
print "#########END##########\n $file\t $time\n";
}
#cleaning up directory
$manager->wait_all_children;
foreach my $file(@files){
my $outfile=$file;
$outfile=~s/fasta/blast/g;
system "touch $outdir/$outfile"; #creates one outfile per fasta file
for (my $j=0;$j<$subsets;$j++){
system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result
system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results
system "rm $outdir/subset_$j\_$file"; #removes data subsets
}
system "rm $outdir/$file.nhr";
system "rm $outdir/$file.nin";
system "rm $outdir/$file.nsq";
#print "#########COMPLETE##########\n $file\t $time\n";
}
Comment
-
-
-
Just copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.
Comment
-
-
Actually there are a few prerequisites, like an executable makeblastdb where ever you run the script. Do you added ncbis blast to your path if you are a non root user? Be aware of the higher memory usage. But it still might be faster. You can contact me if you have any more questions. Bests, Ulrike
Comment
-
-
blast+ programs
I am at the same path where blast+ programs are there. Can see my path following
/Downloads/bp272/ncbi-blast-2.2.31+/bin$ perl blast.pl tblastn /home/sekhwalm/Oryza/blast/ e-5 6 /home/sekhwalm/Oryza/blast/ 5
Originally posted by GenoMax View PostJust copying the files to local directory is not enough if that directory is not in your PATH. Append your PATH to include the directory in question.
Comment
-
-
speed up tblastn query
Hi I am using following script to speed up my query at tblastn, hence it is showing error....
Can't exec "makeblastdb": No such file or directory at blast.pl line 41, <GEN0> line 42132.
Originally posted by uloeber View PostIt's not the most elegant script, but it works fine for an all-vs-all blast. It includes making databases. You need Bio::SeqIO;Parallel::ForkManager; Time::Local. Hope it helps:
Code:#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Parallel::ForkManager; use Time::Local; ############################# #USAGE: perl script.pl blastmethod directory_of_fasta_files eval outfmt number_of_cpus #e.g. perl blastplus_parallel.pl blastn SampleDIR e-10 6 outdir 10 ############################# # author: Ulrike Loeber ([email protected]) #This script takes care of imperfect parallelization of blast+. It splits the files to do smaller jobs on more than one CPU to improve the performance of BLAST+; my $blast_method = $ARGV[0]; my $seq_dir = $ARGV[1]; my $evalue = $ARGV[2]; my $outfmt = $ARGV[3]; my $outdir = $ARGV[4]; my $cpus = $ARGV[5]; opendir (INDIR, $seq_dir) or die $!; my @files=grep /\.fasta$/ , readdir (INDIR); #greps every file in the determined directory which end with "fasta" close INDIR; #one cpu is used for perl, so the number of cpus left is $seq_dir-1 my $numberOfProcesses=($cpus-1); my $subsets=$numberOfProcesses; #build as many subsets as free cpus my $manager = new Parallel::ForkManager( $numberOfProcesses ); foreach my $file(@files){ my $time=localtime(); print "#########PROCESSING##########\n $file\t $time\n"; my $input= Bio::SeqIO-> new( - file => "$seq_dir/$file", -format => "fasta"); my $seq; my @seq_array; while( $seq = $input->next_seq() ) { push(@seq_array,$seq); } my $numberofsequences=@seq_array; system "makeblastdb -dbtype nucl -in $seq_dir/$file"; my $loops=$numberofsequences/$subsets; #is 1/(times) of the number of sequences for (my $j=0;$j<$subsets;$j++){ #creates as many files as subsets to build and loops as many times x open (OUTFILE , ">$seq_dir/subset_$j\_$file") or die $!; #creates a file which is named like the infile with subset_ in front of it for (my $i=$j*$loops;$i<=((($j+1)*$loops)-1);$i++){ #loops through 1/x of the sequences $seq=$seq_array[$i]; my $id=$seq->id(); my $sequence=$seq->seq(); print OUTFILE ">$id\n$sequence\n"; } close OUTFILE; $manager->start and next; system "$blast_method -query $seq_dir/subset_$j\_$file -db $seq_dir/$file -evalue $evalue -outfmt $outfmt -out $outdir.subset_$j\_$file.blast "; $manager->finish; } print "#########END##########\n $file\t $time\n"; } #cleaning up directory $manager->wait_all_children; foreach my $file(@files){ my $outfile=$file; $outfile=~s/fasta/blast/g; system "touch $outdir/$outfile"; #creates one outfile per fasta file for (my $j=0;$j<$subsets;$j++){ system "cat $outdir/subset_$j\_$file.blast >>$outdir/$outfile"; #concatenates subfile results to one blast result system "rm $outdir/subset_$j\_$file.blast"; #removes subset blast results system "rm $outdir/subset_$j\_$file"; #removes data subsets } system "rm $outdir/$file.nhr"; system "rm $outdir/$file.nin"; system "rm $outdir/$file.nsq"; #print "#########COMPLETE##########\n $file\t $time\n"; }
Comment
-
-
Until you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.
If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1
Comment
-
-
Thanks...
now, I got the PATH issue, and blast is running, However after sometimes running it shows following errors..
Warning: [tblastn] Query is Empty!
cat: /home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast: No such file or directory
rm: cannot remove ‘/home/sekhwalm/Downloads/bp272/ncbi-blast-2.2.31+/bin//subset_0_pep.fasta.blast’: No such file or directory
Originally posted by GenoMax View PostUntil you fix the PATH this is not going to work for any blast type. Add "/Downloads/bp272/ncbi-blast-2.2.31+/bin" this directory to your PATH following instructions I had linked in a post above.
If you are serious about learning this then spend a bit of time here understanding some basic unix: http://korflab.ucdavis.edu/Unix_and_...ent.html#part1
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Yesterday, 10:09 AM
|
0 responses
9 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 10:09 AM
|
||
|
Started by SEQadmin2, 06-04-2026, 08:59 AM
|
0 responses
17 views
0 reactions
|
Last Post
by SEQadmin2
06-04-2026, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
26 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
21 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
Comment