Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jpearl01
    replied
    I just wanted to post a follow up to this. I did end up using usearch (and more specifically the ublast algorithm in the package), which was awesome. My database search went from 12hours to ~1min. I used a similar strategy to what I was doing with regular blast, but I was much more specific about the parameters I was after (i.e. very low evalues, and just a single hit). I did distribute across all the nodes in my cluster with sge. I did not purchase the 64bit version of usearch as the speed was fast enough that I no longer felt like this part of the analysis was a bottleneck. Thanks for all the help!

    Leave a comment:


  • jpearl01
    replied
    Thank you for the clarification! This is a new system we have up and running, so it is taking me some time to get up to speed on job submission. What you are saying makes a lot of sense. I was thinking the 'slots' column in the qstat output meant the available nodes.

    Unfortunately changing to -pe smp 16 doesn't seem to be significantly increasing the speed of my output. At least, not noticeably so. The cpu utilization is pretty low on all the nodes, rarely getting above 3%:
    Code:
    HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
    -------------------------------------------------------------------------------
    global                  -               -     -       -       -       -       -
    clusterhn               lx26-amd64     16  2.50  126.0G   14.9G   29.8G  190.7M
    n001                    lx26-amd64     16  1.57  126.0G   11.8G   32.0G   18.9M
    n002                    lx26-amd64     16  2.63  126.0G   11.7G   32.0G   11.8M
    n003                    lx26-amd64     16  2.26  126.0G   11.7G   32.0G   11.2M
    n004                    lx26-amd64     16  2.88  126.0G   11.7G   32.0G   11.8M
    n005                    lx26-amd64     16  2.67  126.0G   11.7G   32.0G   18.0M
    n006                    lx26-amd64     16  3.04  126.0G   11.7G   32.0G   11.8M
    n007                    lx26-amd64     16  2.94  126.0G   11.7G   32.0G   11.3M
    n008                    lx26-amd64     16  3.55  126.0G   11.8G   32.0G   16.5M
    n009                    lx26-amd64     16  2.37  126.0G   11.7G   32.0G   11.7M
    n010                    lx26-amd64     16  2.31  126.0G   11.7G   32.0G   11.0M
    What I've read so far seems to indicate this low CPU utilization in blast is expected. The bottleneck here appears to be the memory usage.

    Code:
    top - 13:52:28 up 68 days,  2:21,  5 users,  load average: 2.94, 2.86, 2.71
    Tasks: 500 total,   4 running, 495 sleeping,   1 stopped,   0 zombie
    Cpu(s): 10.3%us,  6.3%sy,  0.0%ni, 83.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:  132107952k total, 131579868k used,   528084k free,   227880k buffers
    Swap: 31200248k total,   195660k used, 31004588k free, 114107516k cached
    At least on the head node. The other nodes are showing an average memory usage closer to 30%.

    Also, something odd (for me, possibly because I'm unfamiliar with how sge distributes processes) is when I list the processes, the process doesn't seem to be using more than one thread (the NLWP column):
    Code:
    UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
    josh     12698 12686 12698 99    1 01:49 ?        20:13:28 /opt/blast+/blastx -query input.10 -db /data/blast_plus/vr_db_nr -outfmt 6 -num_threads 16 -out 10.tsv
    Have you noticed this before? Perhaps some sge wrapper process is obscuring this?

    Leave a comment:


  • rhinoceros
    replied
    Originally posted by jpearl01 View Post
    Changing the -pe smp 11 to 16 seems to affect the slots that are available (i.e. the nodes), but not the thread count. So, despite increasing the value to 16, only 11 jobs are being created, one on each node, but the --num_thread option on blastx is distributing it across the 16 different processors on each node.
    Code:
    qsub -t 1-100:1 script
    Means that script is being called 100 times, with task ID increasing by 1 after each call.

    Code:
    -pe smp 16
    Allocate 16 cores on CPU for one instance of called script.

    Code:
    -pe orte 16
    Allocate 16 cores for one instance of called script, but not necessarily on single CPU (we don't want this)

    Increasing -pe smp value doesn't effect the number of tasks that are created, it's all about allocating resources for each task. I'm very surprised if -pe smp 11 somehow allows blast to run 16 parallel threads (num_threads 16), the last column in qstat output. What I think is happening is that you have 11 cores on CPU alternating between the 16 threads.

    Leave a comment:


  • GenoMax
    replied
    You are correct in that you can increase the number of array job slots to 16 for the -t command but at this point you are probably saturated on the I/O anyway (check iostat/memstat).

    If you have the time you could try different array job slots with small subset of sequences to find an optimal number. It may turn out to be less than the 11 you are using now or could end up being the full 16.

    Leave a comment:


  • jpearl01
    replied
    Changing the -pe smp 11 to 16 seems to affect the slots that are available (i.e. the nodes), but not the thread count. So, despite increasing the value to 16, only 11 jobs are being created, one on each node, but the --num_thread option on blastx is distributing it across the 16 different processors on each node.
    Code:
     qstat
    job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
    -----------------------------------------------------------------------------------------------------------------
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 1
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 2
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 3
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]       16 4
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 5
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 6
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 7
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 8
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 9
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 10
       1483 0.55500 pe_smp_16  josh         r     01/18/2014 03:56:25 [email][email protected][/email]            16 11
       1484 0.55500 pe_smp_16  josh         qw    01/18/2014 15:16:50                                   16 1-11:1
    I could change the qsub command to increase the number of slots being accessed, but I feel like that would just end up having multiple jobs fighting for the same resources.

    Leave a comment:


  • jpearl01
    replied
    usearch looks quite promising, weird that I haven't heard about it until today. Then again, this is kind of new territory for me; I haven't really done much with microbiome stuff in the past. Huh, I didn't know he was the same guy that developed Muscle. Thanks for the tip! I'll post how these modifications work out.

    Leave a comment:


  • rhinoceros
    replied
    Originally posted by jpearl01 View Post
    Wait, no wrong about that, cpuinfo mislead me since it listed multiple processors. Its actually a single 8 core processor each node and I think 16 threads via hyperthreading.
    That probably makes sense. If it was 2x4 core (2x8 hyper threads), -pe smp > 8 would probably throw an error (as usual, I could be wrong, not that expert with the whole SGE thingy).
    Last edited by rhinoceros; 01-17-2014, 01:13 PM.

    Leave a comment:


  • rhinoceros
    replied
    Originally posted by jpearl01 View Post
    A good suggestion, and you are almost certainly correct. However, I'd like to also get quantitative data out of this (i.e. not just if there IS a hit, but how many there are; looking for particularly enriched sequences). I suppose I could remove identical sequences, and also keep a count of how many of each there were... although that might become analytically challenging. For instance, it is very likely that there are very similar sequences that differ by relatively few nucleotides which would not be removed, since they wouldn't be identical. I'd have to recombine that data somehow. I'd have to think about the best way to do that.
    You could do something like that with USEARCH. E.g. with -derep_prefix you can remove identical sequences (also subsequences) and write the cluster size straight into the fasta header..
    Last edited by rhinoceros; 01-17-2014, 01:12 PM.

    Leave a comment:


  • jpearl01
    replied
    Wait, no wrong about that, cpuinfo mislead me since it listed multiple processors. Its actually a single 8 core processor each node and I think 16 threads via hyperthreading.

    Leave a comment:


  • jpearl01
    replied
    Huh, it cut off the rest of your reply for some reason.

    Thanks for the tip on the script, I like posting the code so people can tell me when I do something silly, that might actually speed it up even more.

    The processors I have are 2xnode:
    model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
    But yes, 2 processors 8 cores each.

    I'll give your new settings a shot on the next file and let you know how it turns out, Thanks!

    Leave a comment:


  • jpearl01
    replied
    A good suggestion, and you are almost certainly correct. However, I'd like to also get quantitative data out of this (i.e. not just if there IS a hit, but how many there are; looking for particularly enriched sequences). I suppose I could remove identical sequences, and also keep a count of how many of each there were... although that might become analytically challenging. For instance, it is very likely that there are very similar sequences that differ by relatively few nucleotides which would not be removed, since they wouldn't be identical. I'd have to recombine that data somehow. I'd have to think about the best way to do that.

    This dataset is also kind of a trial for me. The real purpose is eventually to have a pipeline to do this analysis for any microbiome/metagenome inputs. So, the faster I can get the blast results the better.

    Leave a comment:


  • rhinoceros
    replied
    I'd look into clustering your data before blasting. It sort of sounds like you might have lots of identical sequences there. Assembly before blast would probably lead to much more insightful results too. Also, it looks to me like you're running 16-thread blasts on 11 cores per node, it should be "-pe smp 16". What kind of CPUs do the nodes have? 2 x 8 core Xeons? If yes, "-pe smp 8" and 8-threads would probably be the optimal setting and anything > smp 8 would lead to slow downs since you're trading cache to something else. I could be wrong. Have you monitored the jobs to see if they really are 16-threded per node (qstat -u "yourUID")? Also, you probably meant to write "qsub -t 1-11:1"
    Last edited by rhinoceros; 01-17-2014, 01:00 PM.

    Leave a comment:


  • jpearl01
    started a topic How to further optimize blast+ on a cluster?

    How to further optimize blast+ on a cluster?

    Hello,

    I'm currently running some rather large blast job's on a cluster we have. I have files (~20) of RNAseq data, sequenced with illumina tech. Each sequence is ~100bp, and there are ~20-40million reads in each file. I'm using blastx (v 2.2.28+) to search a blast database I created of proteins I'm curious about (~26,000 sequences).

    The cluster contains 11 (including head) nodes with 16 cores each, and 125GB Ram on each node.

    I first installed mpiblast, and distributed the job across all the nodes, which was kind of underwhelming. It took ~4 days to finish one file, though it should be noted that I originally output to xml.

    Taking a cue from others on this forum, I decided to instead distribute the job using sge, output to tabular format and split the input files into 11 using fastsplitn. Then I run a 16 threaded blastx search, one on each node. (Credit: user rhinoceros from post http://seqanswers.com/forums/showthr...light=mpiblast THANK YOU!!)

    Which is great, shortened the runs down to ~12 hours, so I can get two files done a day.

    However, I'm really greedy and impatient and was curious of anyone else had any ideas about optimizing this even further. Perhaps splitting the job up even more and running several jobs per node?

    If there are enterprising individuals out there who want to see what kind of data I'm working with, I'm just examining the readseq data that you can download from the Human Microbiome Project: http://www.hmpdacc.org/RSEQ/ I suppose blasting against nr or something similar would provide a useful trial. i.e. any optimization against any database would probably also be helpful in my case.

    For those interested, the scripts I'm using are just altered scripts for my cluster as posted originally by user rhinoceros;
    Code:
    #!/bin/bash
    #$ -N run_2062_CP_DZ_PairTo_2061
    #$ -j y
    #$ -cwd
    #$ -pe smp 11
    #$ -R y
    /opt/blast+/blastx -query input.${SGE_TASK_ID} -db /data/blast_plus/vr_db_nr -outfmt 6 -num_threads 16 -out ${SGE_TASK_ID}.tsv
    submitted to sge with:
    Code:
    qsub -t 1_11:1 ../blastx.sh
    Thanks!

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-25-2024, 06:46 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-24-2024, 11:09 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
161 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
127 views
0 likes
Last Post seqadmin  
Working...
X