Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • id0
    Senior Member
    • Sep 2012
    • 130

    Why is the samtools multi-threaded argument undocumented?

    There are a bunch of discussion threads regarding samtools multi-threaded argument (-@). However, I can't seem to find any official documentation regarding that. Does anyone know why that is? Is it safe to use? Does it work for only a subset of samtools commands?

    On a related note, are there other undocumented arguments that might be useful?
  • sdriscoll
    I like code
    • Sep 2009
    • 436

    #2
    Weird - I never knew about that option. Maybe that's what was absorbed from the psamtools project that disappeared. It appears to be an option for 'view' and 'sort'. I just tried it with 'view' and samtools does in fact show > 100% cpu use and it converted sam to bam much faster than without so...I guess it's legit. I tried it with 'sort' as well and it worked fine - much faster.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      There is an open pull request to rename this to -p since the '@' sign causes trouble for some scripting languages like Perl,


      (This was opened a while ago so it may be too late in terms of breaking existing scripts)

      Comment

      • jkbonfield
        Senior Member
        • Jul 2008
        • 146

        #4
        There are at least two different multi-threading implementations in Samtools. Heng Li's and Nils Homer's. The latter appears to be more efficient as it multi-threads decoding too, but it's less clear how to control it to just, say, 500% cpu (as it'll use the same number of threads encoding as decoding). Nils' version takes a -n parameter between "samtools" and the subcommand. Eg samtools -n 4 flagstats a.bam.

        Samtools is undergoing a lot of changes at the moment though with the migration to htslib, and in time the multi-threading issues will be addressed too.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Originally posted by jkbonfield View Post

          Samtools is undergoing a lot of changes at the moment though with the migration to htslib, and in time the multi-threading issues will be addressed too.
          Is there an ETA for this?

          Comment

          • jkbonfield
            Senior Member
            • Jul 2008
            • 146

            #6
            Not that I know of. I believe the first thing planned is a new official release of the reorganised code base, so some unknown length of time after that for threading investigations.

            We also plan to add CRAM support too, but right now I'm still working on that as part of Staden io_lib's sam/bam/cram code.

            Comment

            • jkbonfield
              Senior Member
              • Jul 2008
              • 146

              #7
              It's not samtools, but maybe still of interest. I released the new version of Staden Package's io_lib (aka libstaden-read in some linux distribution) which contains a multi-threaded BAM/CRAM/SAM converter called scramble.

              The threading is still a work in progress and it should therefore be considered as experimental. It works well for BAM (comparable to Nils Homer's implementation), less well for CRAM (maximum maybe 6x speed up, but varies depending on data set), and is as yet only single-threaded for SAM I/O although the SAM reading and writing is far faster than in Samtools.

              Staden Package /io_lib/1.13.2 files. Browse /io_lib/1.13.2 files for Staden Package


              The code hasn't been tested on Windows yet. Older io_lib releases work, but almost certainly the use of pthreads has broken that. I still need to back-port it to windows, or at least the MinGW/Msys environment.

              I'm NOT planning on adding things like sorting and the basic experiments at merging and pileup are really just demonstration / testing tools for myself. Fundamentally this library was written first and foremost to provide an I/O layer for Gap5 (and gap4, xgap, etc before that). Some of the code will make its way into Samtools later on though - specifically the CRAM implementation.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...