Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast+ on a grid

    Hi,

    After reading a lot manual pages I am still uncertain how to properly schedule a multithreaded blast run with SGE, so please help. I wanted to run blastx with option of 16 threads, but I cannot find how to request resources for these 16 threads in a qsub script. The manual gives openmpi examples, followed by either mpiexec or mpirun, but blast+ is not said to be compiled for the openmpi environment, is it? Another available option would be '-pe smp 16', but is not that requesting one node with 16 cores, which may not exist? Other described arguments to qsub for grids elsewhere mention slots=n, cores=n, low* n, high n, threaded n, orte n, orte_fillup - all these return error on our grid.
    Another option would be running blasx with GNU parallel, but again it is not clear how to request number of threads/cores/CPUs in such case.

  • #2
    On your head node type 'man qsub' to get the man page you need. Its been a while since I've used SGE. I think you need a line like this

    #$ -pe 16

    in your submit script.

    I think you'll get much better performance by splitting your query sequences into 16 separate fasta files and submitting an array job.

    Comment


    • #3
      If you have 16-core machines on the cluster, use -pe smp 16, and call BLAST+ with 16 threads. If you only have 4-core machines on the cluster, use -pe smp 4, and call BLAST+ with 4 threads.

      As Mike points out, when you query file is made up of lots of sequences, splitting this into separate FASTA files and running separate BLAST processes on different cluster nodes makes sense (you can then combine their output files).

      BLAST+ itself has no built in capabilities like this, something like MPI-BLAST does but is based on the legacy C BLAST suite and quite old now.
      Last edited by maubp; 03-15-2013, 03:28 AM. Reason: Fixing touch screen typos

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X