Seqanswers Leaderboard Ad

**rhinoceros** · 01-17-2014, 12:38 PM

I'd look into clustering your data before blasting. It sort of sounds like you might have lots of identical sequences there. Assembly before blast would probably lead to much more insightful results too. Also, it looks to me like you're running 16-thread blasts on 11 cores per node, it should be "-pe smp 16". What kind of CPUs do the nodes have? 2 x 8 core Xeons? If yes, "-pe smp 8" and 8-threads would probably be the optimal setting and anything > smp 8 would lead to slow downs since you're trading cache to something else. I could be wrong. Have you monitored the jobs to see if they really are 16-threded per node (qstat -u "yourUID")? Also, you probably meant to write "qsub -t 1-11:1"

**jpearl01** · 01-17-2014, 12:57 PM

A good suggestion, and you are almost certainly correct. However, I'd like to also get quantitative data out of this (i.e. not just if there IS a hit, but how many there are; looking for particularly enriched sequences). I suppose I could remove identical sequences, and also keep a count of how many of each there were... although that might become analytically challenging. For instance, it is very likely that there are very similar sequences that differ by relatively few nucleotides which would not be removed, since they wouldn't be identical. I'd have to recombine that data somehow. I'd have to think about the best way to do that.

This dataset is also kind of a trial for me. The real purpose is eventually to have a pipeline to do this analysis for any microbiome/metagenome inputs. So, the faster I can get the blast results the better.

**jpearl01** · 01-17-2014, 01:02 PM

Huh, it cut off the rest of your reply for some reason.

Thanks for the tip on the script, I like posting the code so people can tell me when I do something silly, that might actually speed it up even more.

The processors I have are 2xnode:
model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
But yes, 2 processors 8 cores each.

I'll give your new settings a shot on the next file and let you know how it turns out, Thanks!

**jpearl01** · 01-17-2014, 01:05 PM

Wait, no wrong about that, cpuinfo mislead me since it listed multiple processors. Its actually a single 8 core processor each node and I think 16 threads via hyperthreading.

**rhinoceros** · 01-17-2014, 01:07 PM

Originally posted by jpearl01 View Post

A good suggestion, and you are almost certainly correct. However, I'd like to also get quantitative data out of this (i.e. not just if there IS a hit, but how many there are; looking for particularly enriched sequences). I suppose I could remove identical sequences, and also keep a count of how many of each there were... although that might become analytically challenging. For instance, it is very likely that there are very similar sequences that differ by relatively few nucleotides which would not be removed, since they wouldn't be identical. I'd have to recombine that data somehow. I'd have to think about the best way to do that.

You could do something like that with USEARCH. E.g. with -derep_prefix you can remove identical sequences (also subsequences) and write the cluster size straight into the fasta header..

**rhinoceros** · 01-17-2014, 01:10 PM

Originally posted by jpearl01 View Post

Wait, no wrong about that, cpuinfo mislead me since it listed multiple processors. Its actually a single 8 core processor each node and I think 16 threads via hyperthreading.

That probably makes sense. If it was 2x4 core (2x8 hyper threads), -pe smp > 8 would probably throw an error (as usual, I could be wrong, not that expert with the whole SGE thingy).

**jpearl01** · 01-17-2014, 01:58 PM

usearch looks quite promising, weird that I haven't heard about it until today. Then again, this is kind of new territory for me; I haven't really done much with microbiome stuff in the past. Huh, I didn't know he was the same guy that developed Muscle. Thanks for the tip! I'll post how these modifications work out.

**jpearl01** · 01-18-2014, 02:07 PM

Changing the -pe smp 11 to 16 seems to affect the slots that are available (i.e. the nodes), but not the thread count. So, despite increasing the value to 16, only 11 jobs are being created, one on each node, but the --num_thread option on blastx is distributing it across the 16 different processors on each node.

Code:

 qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 1
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 2
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 3
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]       16 4
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 5
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 6
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 7
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 8
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 9
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 10
   1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 11
   1484 0.55500 pe_smp_16  josh         qw    01/18/2014 15:16:50                                   16 1-11:1

I could change the qsub command to increase the number of slots being accessed, but I feel like that would just end up having multiple jobs fighting for the same resources.

**GenoMax** · 01-18-2014, 04:30 PM

You are correct in that you can increase the number of array job slots to 16 for the -t command but at this point you are probably saturated on the I/O anyway (check iostat/memstat).

If you have the time you could try different array job slots with small subset of sequences to find an optimal number. It may turn out to be less than the 11 you are using now or could end up being the full 16.

**rhinoceros** · 01-19-2014, 02:41 AM

Originally posted by jpearl01 View Post

Changing the -pe smp 11 to 16 seems to affect the slots that are available (i.e. the nodes), but not the thread count. So, despite increasing the value to 16, only 11 jobs are being created, one on each node, but the --num_thread option on blastx is distributing it across the 16 different processors on each node.

Code:

qsub -t 1-100:1 script

Means that script is being called 100 times, with task ID increasing by 1 after each call.

Code:

-pe smp 16

Allocate 16 cores on CPU for one instance of called script.

Code:

-pe orte 16

Allocate 16 cores for one instance of called script, but not necessarily on single CPU (we don't want this)

Increasing -pe smp value doesn't effect the number of tasks that are created, it's all about allocating resources for each task. I'm very surprised if -pe smp 11 somehow allows blast to run 16 parallel threads (num_threads 16), the last column in qstat output. What I think is happening is that you have 11 cores on CPU alternating between the 16 threads.

**jpearl01** · 01-21-2014, 11:18 AM

Thank you for the clarification! This is a new system we have up and running, so it is taking me some time to get up to speed on job submission. What you are saying makes a lot of sense. I was thinking the 'slots' column in the qstat output meant the available nodes.

Unfortunately changing to -pe smp 16 doesn't seem to be significantly increasing the speed of my output. At least, not noticeably so. The cpu utilization is pretty low on all the nodes, rarely getting above 3%:

Code:

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
clusterhn               lx26-amd64     16  2.50  126.0G   14.9G   29.8G  190.7M
n001                    lx26-amd64     16  1.57  126.0G   11.8G   32.0G   18.9M
n002                    lx26-amd64     16  2.63  126.0G   11.7G   32.0G   11.8M
n003                    lx26-amd64     16  2.26  126.0G   11.7G   32.0G   11.2M
n004                    lx26-amd64     16  2.88  126.0G   11.7G   32.0G   11.8M
n005                    lx26-amd64     16  2.67  126.0G   11.7G   32.0G   18.0M
n006                    lx26-amd64     16  3.04  126.0G   11.7G   32.0G   11.8M
n007                    lx26-amd64     16  2.94  126.0G   11.7G   32.0G   11.3M
n008                    lx26-amd64     16  3.55  126.0G   11.8G   32.0G   16.5M
n009                    lx26-amd64     16  2.37  126.0G   11.7G   32.0G   11.7M
n010                    lx26-amd64     16  2.31  126.0G   11.7G   32.0G   11.0M

What I've read so far seems to indicate this low CPU utilization in blast is expected. The bottleneck here appears to be the memory usage.

Code:

top - 13:52:28 up 68 days,  2:21,  5 users,  load average: 2.94, 2.86, 2.71
Tasks: 500 total,   4 running, 495 sleeping,   1 stopped,   0 zombie
Cpu(s): 10.3%us,  6.3%sy,  0.0%ni, 83.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132107952k total, 131579868k used,   528084k free,   227880k buffers
Swap: 31200248k total,   195660k used, 31004588k free, 114107516k cached

At least on the head node. The other nodes are showing an average memory usage closer to 30%.

Also, something odd (for me, possibly because I'm unfamiliar with how sge distributes processes) is when I list the processes, the process doesn't seem to be using more than one thread (the NLWP column):

Code:

UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
josh     12698 12686 12698 99    1 01:49 ?        20:13:28 /opt/blast+/blastx -query input.10 -db /data/blast_plus/vr_db_nr -outfmt 6 -num_threads 16 -out 10.tsv

Have you noticed this before? Perhaps some sge wrapper process is obscuring this?

**jpearl01** · 02-06-2014, 09:16 AM

I just wanted to post a follow up to this. I did end up using usearch (and more specifically the ublast algorithm in the package), which was awesome. My database search went from 12hours to ~1min. I used a similar strategy to what I was doing with regular blast, but I was much more specific about the parameters I was after (i.e. very low evalues, and just a single hit). I did distribute across all the nodes in my cluster with sge. I did not purchase the 64bit version of usearch as the speed was fast enough that I no longer felt like this part of the analysis was a bottleneck. Thanks for all the help!

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

How to further optimize blast+ on a cluster?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News