Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • krobison
    Senior Member
    • Nov 2007
    • 734

    #16
    Originally posted by yaximik View Post
    :The question was - can actually BWA, samtools and blastx+ run with multiple threads when spread across several nodes? If not, this answers the question. If yes, are there any specifics/pecularities in scheduling resources?
    I believe the answer to your original question is no. Threads from a single binary cannot be splayed across multiple nodes unless the program is using a framework (such as OpenMPI) that enables this. To my knowledge, none of the tools you are describing are enabled in such as way. Ray & ABySS are two examples of OpenMPI enabled tools.

    As other posters have noted, what you would need to do for BLAST/BWA/bowtie/samtools etc is split your job into sub-jobs, run those on the cluster using the cluster scheduling software, and then merge the results at the end.

    Comment

    • rhinoceros
      Senior Member
      • Apr 2013
      • 372

      #17
      Originally posted by krobison View Post
      I believe the answer to your original question is no. Threads from a single binary cannot be splayed across multiple nodes unless the program is using a framework (such as OpenMPI) that enables this. To my knowledge, none of the tools you are describing are enabled in such as way. Ray & ABySS are two examples of OpenMPI enabled tools.

      As other posters have noted, what you would need to do for BLAST/BWA/bowtie/samtools etc is split your job into sub-jobs, run those on the cluster using the cluster scheduling software, and then merge the results at the end.
      SGE's orte parallel environment can split blast threads over multiple nodes. However, using blast like this is a waste of resources as most of the time blast isn't heavy on the CPU(s) anyway and infiniband is a dog in comparison to smp. Thus, far faster is to run multiple multithreaded instances of blast with split input (not thousands of files, but some 10-50). Our newest cluster has 19 nodes, 2 x 8 cores in 17 nodes with 256 GB ram each and 2 nodes with 2 x 16 cores and 756 GB ram each. In this setup, I've been blasting predicted proteins from metagenomes (100s of thousands to more than a million peptides) against nr in very reasonable times with a script similar to what I mentioned earlier. I don't know it it's 'the best' way to do it, but I know it works..

      p.s. I'm not sure blast xml output(s) can be concatenated like the tabular output. Additionally, the xml output files grow ridiculously large very fast. Additional thing to consider: run your blasts from a directory that doesn't get backed up on a regular basis (usually something like /jobs/youruid).
      Last edited by rhinoceros; 05-01-2013, 06:59 AM.
      savetherhino.org

      Comment

      • yaximik
        Senior Member
        • Apr 2011
        • 199

        #18
        Originally posted by krobison View Post
        I believe the answer to your original question is no. Threads from a single binary cannot be splayed across multiple nodes unless the program is using a framework (such as OpenMPI) that enables this. To my knowledge, none of the tools you are describing are enabled in such as way. Ray & ABySS are two examples of OpenMPI enabled tools.
        I guess I was confused by FAQ entry from open-mpi.org
        MPI, Open MPI, Open-MPI, OpenMPI, parallel computing, HPC, high performance computing, beowulf, linux, cluster, parallel, distributed

        While uptime in the example is certainly not specifically openmpi-enabled, it is not multithreaded per se as, say blastx, so event if multiple instances of it can be launched with mpirun, attempts to use multithreading creates a lot of mess.

        Comment

        • rhinoceros
          Senior Member
          • Apr 2013
          • 372

          #19
          Originally posted by yaximik View Post
          I guess I was confused by FAQ entry from open-mpi.org
          MPI, Open MPI, Open-MPI, OpenMPI, parallel computing, HPC, high performance computing, beowulf, linux, cluster, parallel, distributed

          While uptime in the example is certainly not specifically openmpi-enabled, it is not multithreaded per se as, say blastx, so event if multiple instances of it can be launched with mpirun, attempts to use multithreading creates a lot of mess.
          I'm not 100% sure, but I think your script simply starts the same multithreaded job multiple times. Blast is not mpi compatible.
          savetherhino.org

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #20
            Originally posted by rhinoceros View Post
            I'm not 100% sure, but I think your script simply starts the same multithreaded job multiple times. Blast is not mpi compatible.
            I'm ~99% sure you're correct. Starting the same non-mpi-aware process on multiple nodes of a grid or cluster will just cause problems. The various instances won't talk to each other, which is what yaximik is hoping will happen.

            Comment

            • yaximik
              Senior Member
              • Apr 2011
              • 199

              #21
              Originally posted by dpryan View Post
              I'm ~99% sure you're correct. Starting the same non-mpi-aware process on multiple nodes of a grid or cluster will just cause problems. The various instances won't talk to each other, which is what yaximik is hoping will happen.
              Like I said, it was not entirely groundless hope, as from consultations with openmpi user community it did not sound like unreasonable. But, in addition to openmpi, one has to know more about application itself, this is why I asked some questions here. Indeed, with GenoMax links and rinoceros example script I launched multithreaded blastx using smp interface and it seems running fine so far. I used smp* 8-12 option, and called it as 1-100 array job with max parallel 36 instances, each with as many threads as allocated per node (from the range) as our grid has nodes with different architecture. I see instances are launched on different nodes with different number of threads. Now the most important issue is to figure out how much time I need per particular dataset, as long job is not killed before it completes.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 11:10 AM
              0 responses
              6 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              42 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              102 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              124 views
              0 reactions
              Last Post SEQadmin2  
              Working...