Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast threads always drop to 1

    I've seen this issue with every version of blast+ that I've used recently. When I run jobs on a multi-core machine, I specify -num_threads XX to speed things up. Invariably, no matter how many threads I specify, after a short time it seems as though only 1 thread is active on the machine. I've compiled the blast binaries myself using both gcc and the intel icc compilers. When I start the job, top shows blastn/p/x using, say, 800% of processor if I specify 8 threads. After a few minutes this drops to 100%. The job completes in less time than a single-threaded job, but not by much. Is this normal behavior?

    Thanks!

  • #2
    One thought that sprung to mind is whether BLAST is I/O limited rather than CPU limited. I guess this might happen if you had a very large database and slow disks. vmstat/iostat might help determine this.

    Comment


    • #3
      Here's the official response from NCBI. Only some of the code is multithreaded..


      "BLAST search has three distinctive stages: word matching with database scan, ungapped alignment, gapped alignment with traceback.

      As I understand it Only the word match stage is multi-threaded. So what you described make sense and it correct."

      Comment


      • #4
        That does indeed make sense. Sounds like it might be a bottleneck if your search returns a lot of matches that need aligning.

        Comment


        • #5
          blast+ vs older blastall

          I've seen the same behavior for blast+ vs the older blastall. In section 4.5 of the NCBI user manual for blast+ they show a performance improvement over blastall for queries of length 10Kb - 10Mb, but for shorter queries my experience is that blast+ is much, much slower. When I run a blastx with 50 DNA queries of average length 1135 against a protein database of 475000 sequences (161M total letters) using 8 cpus, the blastall 2.2.18 code finishes the run in just under 5 hours with cpu usage 764%. The same blastx with blast+ version 2.2.23 is still running after 16 hours and has only finished 14000 queries, and the cpu usage shows 243% for 8 processors. Needless to say, I won't be encouraging anyone with shorter queries to use blast+ until this problem has been fixed.

          Comment


          • #6
            Well if all fails here's a simple little script to Multithread any application, I use it whenever I have to use blat or maq which don't have multithread support. It counts how many processes have been started and if there are less than the number of threads (24 in the script below) it starts a new one. All you need to do is cut your reads into many pieces.


            for x in *.fa
            do

            while [ $(ps -Af | grep "blast" | wc -l) -gt 24 ]
            do
            sleep 5
            done

            blast $x &...
            sleep 1

            done

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-22-2024, 02:06 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Working...
            X