Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • davispeter
    Junior Member
    • May 2013
    • 2

    DNA assembly on GPU

    I was looking for DNA assemblers that work on GPU. I found only this paper http://www.cs.gmu.edu/~tr-admin/pape...-TR-2011-1.pdf - GPU Euler. But I was not satisfied with the concept and results provided in the paper. It works by finding the whole Euler tour without any graph transformation and error correction, still it is getting results comparable to well established assemblers like Euler SR. Like max-length of 40,000 , N-50 value 8000, etc. No other assembler works by finding the whole Euler tour then how this paper is mentioning such good results. Does anyone have read or worked with this paper?
  • samanta
    Senior Member
    • Feb 2010
    • 108

    #2
    I would say GPU is a no-go for genome assembly. We looked at various options for doing genome assembly in GPUs last year, and could not make the algorithms scale well. Genome assembly programs need very large memory bandwidth, and it is not possible to scale the programs well in the GPUs, whose greatest benefit is access to many 'parallel' processors. Late last year, I attended BGI's booth at HPC conference (Salt Lake City) and saw a number of GPU solutions being presented for various bioinformatics problems, but the genome assembly program did not seem to give any performance boost. At present our group is working on implementing a genome assembler in FPGA, where we can get the performance boost.

    I will forward your question to BGI's Ruibang, who can probably shed more light on the current status.
    http://homolog.us

    Comment

    • davispeter
      Junior Member
      • May 2013
      • 2

      #3
      hmmmm. Thanks for reply. But what about the Euler approach. In the paper (that I mentioned) the Euler approach is implemented parallely. Does that mean Euler approach is not good?

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        #4
        According to the tech report, 90% of total time goes to "I/O". If I understand correctly, this "I/O" phase, unusually, includes k-mer counting and is done purely with CPU. K-mer counting is one of the slowest and most memory hungry steps in the construction of de Bruijn graph. If we cannot parallelize this step with GPU, we will not get much speed up.

        In addition, the reported assembly speed is slower than what I would expect with velvet. I think velvet can usually get the results in a minute or so given 20X error-free data for a ~2Mbp genome. That is in par with GPU-Euler.

        In all, I think the tech report does not prove that a GPU-based de Bruijn assembler is much better than CPU-based ones.

        Comment

        • Aqua
          Junior Member
          • Jan 2013
          • 2

          #5
          Thanks Samanta and lh3. I'm not closing the possibility of implementing an ultra-fast assembler on GPU, but the reduction nature of genome assembly problem constrained it from scaling well on GPU. For the latest GPU model nVidia GTX Titan, which has 2600+ cores but only ~300G memory bandwidth, every core will only get ~100MB/s memory bandwidth, not mentioning the optimal can only be achieve by coalesced memory access, which is almost impossible to be fulfilled no matter using DBG, String Graph or Greedy. Another problem is that GPU has only limited amount of on-board memory (3G-12G), swapping between host memory and GPU memory is possible but ultimately slow.

          Differently, the problem of alignment is mainly "mapping" problem in MapReduce scheme, which makes it suitable for GPU or other HPC accelerator like FPGA and MIC. Plenty investigations have been done: "SOAP3-dp" (http://arxiv.org/abs/1302.5507) and "CUSHAW2-GPU" (http://cushaw2.sourceforge.net/homepage.htm#latest) has achieved more than 10x acceleration to CPU aligners, the most important, much higher sensitivity and accuracy in opening large gaps provide much more computational power.

          BTW, frankly speaking, CPU assemblers, say SOAPdenovo2 and ALLPATH-LG, still have a large space to be improved. Samanta has a very good discussion on the hash function used in assemblers (http://homolog.us/blogs). A question is that, why we have to use standard, general hash functions in assembler? The only feature assemblers require the hash functions to have is the evenness, why shall we care that much about avalanche test.

          Comment

          • samanta
            Senior Member
            • Feb 2010
            • 108

            #6
            Here are the links.

            Few days back, a reader asked us in Twitter, whether de Bruijn graph-based assemblers could save and reload de Bruijn graphs from one another. Short Twitter answer was no. Long answer follows here.


            We have been going through various web-based resources on high-quality hash functions and made a startling discovery. None of the good websites was maintained by the members of computer science departments at top universities or even second-rank universities. Based on our highly anecdotal evidence, computer science professors stopped thinking about hash functions many decades back. That seemed puzzling, because in the world where it matters, research on hash function still attracts big money.
            http://homolog.us

            Comment

            • lh3
              Senior Member
              • Feb 2008
              • 686

              #7
              I should clarify that I am not closing the possibility of a good GPU assembler, either. I am just saying that we have not reached there yet. I also agree GPU based aligners are impressive works.

              Comment

              • mchaisso
                Member
                • Apr 2008
                • 84

                #8
                Originally posted by davispeter View Post
                hmmmm. Thanks for reply. But what about the Euler approach. In the paper (that I mentioned) the Euler approach is implemented parallely. Does that mean Euler approach is not good?
                Unfortunately, under wrote definition of an Eulerian tour, finding a full Eulerian tour is meaningless when there are errors in sequences, and repeats longer than k. The de Bruijn based assemblers output contigs that represent unambiguous (and sequencing error-free) paths taken on the traversal of the de Bruijn graph. The original power of the de Bruijn approach was an efficient encoding of overlaps of very short (30nt) reads.

                -mark

                Comment

                • ctseto
                  Member
                  • Oct 2013
                  • 44

                  #9
                  Haven't seen any as of late 2018, and I've been looking after getting back into de novo assembly...
                  Always possible that as someone getting back in that I've missed something.

                  Edit: I see megahit can use a GPU for graph construction
                  Last edited by ctseto; 12-06-2018, 07:14 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    Yesterday, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 12:03 PM
                  0 responses
                  19 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, Yesterday, 11:40 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-26-2026, 10:12 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...