Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why no multithreading for BWA sampe/samse?

    BWA in the align mode (generating the indices of reads on chromosomes) has multiple threads as an option but doesn't seem to have this for generating the actual alignments



    Is it generally I/O-bound (waiting on disk) when doing the SAM generation? Or is this just something on the to-do list?

  • #2
    To implement multi-threading, we need a lock-free hash table; otherwise the hash table will be frequently locked and I guess a lot of CPU time will be spent on frequent locking. More importantly, samse is much faster than aln; sampe is also faster especially for >50bp reads. Multithreading them will not help the wall clock speed greatly. Aln is the speed bottleneck, so it gets multithreaded.

    Comment


    • #3
      THANKS! That clears things up.

      Comment


      • #4
        I've been spending today doing performance testing on Illumina reads - 36 bp per read.

        I am seeing the following performance:

        aln: 2900 reads per second per CPU core
        sampe: 3300 reads per second

        So with four cores, aln is 3 times faster than sampe. Are you seeing different performance?

        With these numbers, the performance is limited by sampe and implementing multithreading will be a big win.

        Comment


        • #5
          I guess sampe is 3300 read pairs per second. It is twice faster than aln in terms of #reads per CPU core. In addition, you will find sampe is even faster for 70bp reads which is becoming available to many labs. A 36bp read has many locations and bwa will consider all of them in pairing. 70bp has much fewer occurrences. That is also why bwa does not work well for 25bp SOLiD reads; sampe will be very slower.

          I know this issue from the very beginning, but implementing a thread-safe/lock-free hash table is not that easy. Thanks anyway.

          EDIT: what is this hash table for, in case someone is curious. The bottleneck in pairing is to convert suffix array coordinates to chromosomal coordinates especially for a highly repetitive read. Bwa uses a hash table to cache large SA intervals such that a large interval that has been converted to chromosome positions will not be converted again. This hash table is global, which adds difficulty to multithreading.
          Last edited by lh3; 02-19-2010, 08:23 PM.

          Comment


          • #6
            The sampe figure is per read, not per pair. Are you seeing different numbers in your experience?

            Also, because sampe requires 3.5GB of RAM, it's not possible to run more than one on an 8GB machine where other things are going on.

            I do understand that there are challenges in implementation and that read lengths are probably going to continue increasing.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Best Practices for Single-Cell Sequencing Analysis
              by seqadmin



              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
              06-06-2024, 07:15 AM
            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:58 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:18 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-06-2024, 08:04 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-03-2024, 06:55 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Working...
            X