Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparison of programming languages for NGS tools

    We have just published a paper about comparing Go, Java, and C++ for implementing our sequencing tool elPrep in BMC Bioinformatics.

    elPrep is available at https://github.com/exascience/elprep

    Title: A comparison of three programming languages for a full-fledged next-generation sequencing tool

    Authors: Pascal Costanza, Charlotte Herzeel, Wilfried Verachtert

    URL: https://doi.org/10.1186/s12859-019-2903-5

    Background:
    elPrep is an established multi-threaded framework for preparing SAM and BAM files in sequencing pipelines. To achieve good performance, its software architecture makes only a single pass through a SAM/BAM file for multiple preparation steps, and keeps sequencing data as much as possible in main memory. Similar to other SAM/BAM tools, management of heap memory is a complex task in elPrep, and it became a serious productivity bottleneck in its original implementation language during recent further development of elPrep. We therefore investigated three alternative programming languages: Go and Java using a concurrent, parallel garbage collector on the one hand, and C++17 using reference counting on the other hand for handling large amounts of heap objects. We reimplemented elPrep in all three languages and benchmarked their runtime performance and memory use.

    Results:
    The Go implementation performs best, yielding the best balance between runtime performance and memory use. While the Java benchmarks report a somewhat faster runtime than the Go benchmarks, the memory use of the Java runs is significantly higher. The C++17 benchmarks run significantly slower than both Go and Java, while using somewhat more memory than the Go runs. Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case.

    Conclusions:
    Based on our benchmark results, we selected Go as our new implementation language for elPrep, and recommend considering Go as a good candidate for developing other bioinformatics tools for processing SAM/BAM data as well.

  • #2
    We have just published a small follow-up paper to this in August: Comparing Ease of Programming in C++, Go, and Java for Implementing a Next-Generation Sequencing Tool

    In this article, we discuss the difficulty of achieving the best performance in each language in terms of programming language constructs and standard library support. While benchmarks are easy to objectively measure and evaluate, this is less obvious for assessing ease of programming. However, because we expect elPrep to be regularly modified and extended, this is an equally important aspect. We illustrate representative examples of challenges in all 3 languages, and give our opinion why we think that Go is a reasonable choice also in this light.

    This is a more subjective paper, which is why it is categorized as an "Article Commentary", but we believe this is still an interesting perspective for a wider audience.

    Comment


    • #3
      I thought these papers were really interesting perspectives on a difficult subject. Did you get people commenting on how you could have optimized their favorite language a bit more?
      Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

      Comment


      • #4
        @SNPsaurus: We are working at a lab where we also have experts on C++, and we have also some significant expertise on Java. We were very careful to ensure that for each language, we have used its respective strengths in terms of performance.

        The source code for all three versions is publicly available. There was a public discussion at https://www.reddit.com/r/programming..._and_java_for/ where the C++ and Java versions of our code was criticized, but we believe we were able to address all the points that were raised.

        I hope this helps.

        Comment


        • #5
          That had exactly the kind of language nitpicking I was hoping to see!
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Non-Coding RNA Research and Technologies
            by seqadmin




            Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

            Nobel Prize for MicroRNA Discovery
            This week,...
            10-07-2024, 08:07 AM
          • seqadmin
            Recent Developments in Metagenomics
            by seqadmin





            Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
            09-23-2024, 06:35 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 10-11-2024, 06:55 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-02-2024, 04:51 AM
          0 responses
          110 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-01-2024, 07:10 AM
          0 responses
          114 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-30-2024, 08:33 AM
          1 response
          121 views
          0 likes
          Last Post EmiTom
          by EmiTom
           
          Working...
          X