Seqanswers Leaderboard Ad

**GenoMax** · 04-12-2013, 05:04 AM

Have a look at this as another option: http://www3.imperial.ac.uk/bioinfsup...ing_array_jobs

Even though you have 16 CPU's how much memory do you have available for each? You may need about ~10G per job if you are going to search against "nr".

**rhinoceros** · 04-12-2013, 09:02 AM

Originally posted by GenoMax View Post

Have a look at this as another option: http://www3.imperial.ac.uk/bioinfsup...ing_array_jobs

Even though you have 16 CPU's how much memory do you have available for each? You may need about ~10G per job if you are going to search against "nr".

Thanks, the link is very helpful. The cluster has 256G RAM. So, I suppose a good solution would be to run 16 independent tasks with 16 threads in each.

**GenoMax** · 04-12-2013, 09:17 AM

Originally posted by rhinoceros View Post

Thanks, the link is very helpful. The cluster has 256G RAM. So, I suppose a good solution would be to run 16 independent tasks with 16 threads in each.

What kind of a cluster is this?

Most commodity clusters have nodes with a certain amount of RAM (e.g. on a cluster I access there are blades with dual quad core xeon CPU's accessing 72GB of local RAM) and then there clusters with "shared" memory access (e.g. NUMA). I have not seen cluster of the latter kind in common use of late.

Is your cluster the latter type when you say that you have 256G RAM? Or do you actually have 256G RAM on each node (not completely unlikely now-a-days)?

Unless you are the only person using this cluster you may not be able to spawn off those many jobs simultaneously. Then there will be some dependence on the type/speed of storage.

**rhinoceros** · 04-12-2013, 09:37 AM

Originally posted by GenoMax View Post

What kind of a cluster is this?

Most commodity clusters have nodes with a certain amount of RAM (e.g. on a cluster I access there are blades with dual quad core xeon CPU's accessing 72GB of local RAM) and then there clusters with "shared" memory access (e.g. NUMA).

Is your cluster the latter type when you say that you have 256G RAM? Or do you actually have 256G RAM on each node (not completely unlikely now-a-days)?

I'm not 100% sure, but I think the cluster consists of 16 Dell R620's, i.e. 16 GB RAM in each node..

Code:

cat /proc/meminfo
MemTotal:       264635596 kB
..

**GenoMax** · 04-12-2013, 09:54 AM

Originally posted by rhinoceros View Post

I'm not 100% sure, but I think the cluster consists of 16 Dell R620's, i.e. 16 GB RAM in each node..

Code:

cat /proc/meminfo
MemTotal:       264635596 kB
..

So you do have a cluster of the first type and the cluster head-node does seem to have 256GB RAM (assuming that is where you ran the cat command).

Not sure if your sys admins allow you to run jobs on head-node ....

If the worker nodes have only 16GB RAM each then you are not going to be able to perhaps run more than one job per node (you could but then things will use swap/tmp and everything will be slow). I suggest experimenting with test jobs allocating different memory to see if you could squeeze in two jobs per node.

**rhinoceros** · 04-13-2013, 07:36 AM

Hello again,

Will the following result in 16 parallel instances of blast with each instance running 16 threads? Original input.fasta has been divided into 16 files named input.1 - input.16.

qsub -t 1-16:1 blastp-sge.sh

Code:

#!/bin/bash
#$ -N blastp
#$ -j y
#$ -cwd
#$ -l h_vmem=2G -pe smp 8
#$ -R y
/path/to/ncbi-blast/2.2.28+/bin/blastp -query input.${SGE_TASK_ID} -db /path/to/db/nr -seg yes -soft_masking true -use_sw_tback -evalue 1e-5 -outfmt "6 qseqid sseqid sgi staxids pident length mismatch gapopen qstart qend sstart send evalue bitscore" -num_threads 16 -out ${SGE_TASK_ID}.tsv

Output would be 1.tsv - 16.tsv which could be merged easily. I'm having particularly hard time understanding the '#$ -l h_vmem=2G -pe smp 8' line.

**GenoMax** · 04-15-2013, 03:46 AM

Originally posted by rhinoceros View Post

I'm having particularly hard time understanding the '#$ -l h_vmem=2G -pe smp 8' line.

The h_vmem parameter has to do with the memory allocation for the job. This page has info about this parameter: http://www.biostat.jhsph.edu/bit/clu...e.html#MemSpec

The "pe" part refers to a parallel environment (if there is one set up on your cluster). This would be related to "num_threads" part for your blast jobs as described here: http://www3.imperial.ac.uk/bioinfsup..._parallel_jobs

You may want to confer with your local SGE admin about the right parameters to set for the queues you have access to.

**rhinoceros** · 04-18-2013, 09:24 AM

Everything is working now. My script blastp.sh is as follows:

Code:

#!/bin/bash
#$ -V
#$ -N blastp
#$ -j y
#$ -cwd
#$ -pe orte 16
/path/to/ncbi-blast/2.2.28+/bin/blastp -query input.${SGE_TASK_ID} -db /path/to/db/nr -lotsOfFlags -outfmt 6 -num_threads 16 -out ${SGE_TASK_ID}.tsv

The input is a fasta file that I have split to 20 parts with fastasplitn (input.1, input.2, .., input.20). I call the script from the same dir as follows: qsub -t 1-20:1 blastp.sh

So I'm running in this case 20 parallel blasts with 16 threads in each (though actually some of them are in the queue). Output is 1.tsv, 2.tsv, .., 20.tsv which I'll merge by

cat 1.tsv 2.tsv .. 20.tsv > blast_result.tsv

And that's that. I hope others might find this useful..

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

SGE and ncbi-blast-2.2.28+

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News