Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    Speed up sequence alignments using your video card!

    Schatz, et al have just published a paper in BioMed Central Bioinformatics describing a novel sequence alignment algorithm designed and optimized to run on commonly available graphics processors.

    Why use a graphics processor? My understanding is that processing data for 3D renderings involves specialized parallel processors that can do certain types of calculations extremely fast. This means that if one has optimized code to take advantage of this architecture, it is possible to perform calculations faster than on a standard CPU of the same price. That is the extent to which I can explain the advantages of a graphics processor, as a biologist.

    They demonstrate up to a 10-fold improvement in alignment speed when compared to a standard CPU. The current fastest commercial graphics adapter, the nVidia 8800GTX, was able to run the project 3.79X faster than a single core 3Ghz Xeon processor (which costs the same).

    A link to the paper PDF can be found here: http://www.biomedcentral.com/content...2105-8-474.pdf

    A link the project sourceforge homepage can be found here: http://mummergpu.sourceforge.net
  • apfejes
    Senior Member
    • Feb 2008
    • 236

    #2
    I had a conversation with someone about this, back in 2004. I don't think there was any question, then or now, that using a processor that's specially designed for vector processing would be a lot faster than using a general purpose CPU for vector calculations. Even the article points to several instances of earlier use of GPUs for processing.

    Still, the most interesting part of the article to me is that the improvement vs. processing on a CPU decreases as the sequence length becomes longer. This is probably an artefact of the necessity of caching small chunks of their suffix tree to the GPU at a time. The larger the suffix tree, the more time you need to spend pre-caching suffix tree elements. (Just a guess.. someone tell me if I'm wrong.) That tells me that there's *probably* a dramatically better algorithm out there for this application than a suffix tree.

    In the end, I'm just surprised to see that they managed to get a speed up at all. Sequence alignments are a non-vector application, so the use of a vector processor seems non-intuitive. If this were a molecular simulation, on the other hand... but then again, I believe that's been done before, as well.
    The more you know, the more you know you don't know. —Aristotle

    Comment

    • mschatz
      Junior Member
      • Apr 2008
      • 3

      #3
      Maybe I can answer your questions for you. GPUs aren't exactly vector processors, and have a lot more flexibility than those. Instead think of them as single-board mini-grids containing many lightweight processors that all run the same program at the same time (SIMD, not vector architecture). The processors are optimized for the number crunching needed for rendering 3D graphics, but the programs they run can perform arbitrary computations using regular programming statements like loops and conditionals. This means that if you have a problem that requires the same computation for many different inputs, you can probably use a GPU to speed up your application.

      Current GPUs only cost ~$500, but have up to 256 processors! As such they becoming really attractive platforms for high-throughput computation in many different fields (including molecular dynamics, meteorology, financial, cryptography, ...) . Some applications that perform a lot of number crunching have achieved 100x speedup over the CPU. In contrast, MUMmerGPU performs very little number crunching, but is very data intensive. As such, the processors on the GPU can't run at full speed, and have to wait for data to move around on the board. Even still, MUMmerGPU gets ~10x speedup on the 8800 GTX with 128 processors for short reads. Over the last couple months we reworked how the data is organized, and we have managed to double that speed. Check the MUMmerGPU Sourceforge page for a new release soon.

      As for apfejes' comment about decreasing performance with longer reads, this is an artifact of how we organize the suffix tree on the board. The GPU has a very small cache, so we put the tree on the board in a very specific way to try to get as much use of the cache as possible (see the paper for all the gory details). It wasn't until recently that we fully understood the problem, but the way that we place the tree on the board is sub-optimal for longer reads. Again we are actively working on this and the next release should have much more consistent performance.

      If you have any more questions, feel free to post here or email me directly.

      Thanks for you interest,

      Michael Schatz

      Comment

      • apfejes
        Senior Member
        • Feb 2008
        • 236

        #4
        Thanks for the reply - that was really helpful. I look forward to reading about the future releases!

        Anthony
        The more you know, the more you know you don't know. —Aristotle

        Comment

        • Chipper
          Senior Member
          • Mar 2008
          • 323

          #5
          Michael,

          thanks for the explanation. What is the minimum requirements for the graphichs card and what is most important, mem size / bus /speed or number of processors? Does it work in SLI with two cards? Also, have you done any speed comparisons to other short read aligners?

          Comment

          • pfh
            Junior Member
            • May 2008
            • 7

            #6
            Sequence alignment is vectorizable, and there are various SIMD implementations. There is a brute force sequence aligner in the FASTA package that uses SIMD, for example.

            If you want to align multiple sequences, it's even easier. I've been working on a brute force aligner of short reads to a reference that runs on Cell processors such as the PlayStation 3, available here: http://savannah.nongnu.org/projects/myrialign/

            I am impressed that they've managed to do MUMmer on a GPU, it uses quite a different algorithm to the usual dynamic programming sequence alignment, afaik.

            Comment

            • Chipper
              Senior Member
              • Mar 2008
              • 323

              #7
              Originally posted by mschatz View Post
              Maybe I can answer your questions for you. GPUs aren't exactly vector processors, and have a lot more flexibility than those. Instead think of them as single-board mini-grids containing many lightweight processors that all run the same program at the same time (SIMD, not vector architecture). The processors are optimized for the number crunching needed for rendering 3D graphics, but the programs they run can perform arbitrary computations using regular programming statements like loops and conditionals. This means that if you have a problem that requires the same computation for many different inputs, you can probably use a GPU to speed up your application.

              Current GPUs only cost ~$500, but have up to 256 processors! As such they becoming really attractive platforms for high-throughput computation in many different fields (including molecular dynamics, meteorology, financial, cryptography, ...) . Some applications that perform a lot of number crunching have achieved 100x speedup over the CPU. In contrast, MUMmerGPU performs very little number crunching, but is very data intensive. As such, the processors on the GPU can't run at full speed, and have to wait for data to move around on the board. Even still, MUMmerGPU gets ~10x speedup on the 8800 GTX with 128 processors for short reads. Over the last couple months we reworked how the data is organized, and we have managed to double that speed. Check the MUMmerGPU Sourceforge page for a new release soon.

              As for apfejes' comment about decreasing performance with longer reads, this is an artifact of how we organize the suffix tree on the board. The GPU has a very small cache, so we put the tree on the board in a very specific way to try to get as much use of the cache as possible (see the paper for all the gory details). It wasn't until recently that we fully understood the problem, but the way that we place the tree on the board is sub-optimal for longer reads. Again we are actively working on this and the next release should have much more consistent performance.

              If you have any more questions, feel free to post here or email me directly.

              Thanks for you interest,

              Michael Schatz

              Any work still going on in this field or are the bowtie-type aligners on cpu superior?

              Comment

              • Cole Trapnell
                Senior Member
                • Nov 2008
                • 213

                #8
                Originally posted by Chipper View Post
                Any work still going on in this field or are the bowtie-type aligners on cpu superior?
                Mike and I submitted a second paper on MUMmerGPU a couple of months back, but it's still under review. The paper contains a new GPGPU algorithm for translating suffix tree node coordinates into reference coordinates. It also contains a very detailed exploration about how seemingly orthogonal design decisions interact because of the peculiarities of the GPU architecture. The new paper is more targeted to the GPGPU community than to bioinformaticians.

                Mike, Ben Langmead, and I have actually spent some time thinking about putting Bowtie on the GPU, but we're worried about the relatively long latency of the GPU's memory bus. The architecture is organized so that sucking down big streams of data (e.g. large textures) is fast, but other than the initial loading of the reads, that's not the access pattern of Burrows-Wheeler search. Bowtie's performance essentially comes down to waiting for small chunks of data to come in from the memory bus (i.e. cache misses). Since recent nVidia GPUs have a global memory latency that is substantially longer than that of your typical x86 cache miss, I worry that you'd wipe out all your gains from massively parallel processing in the longer per-read processing time.

                That said, suffix tree traversal was supposed to be a bad fit for GPGPU for the same reasons, and the MUMmerGPU search kernel was substantially faster on the GPU than on the CPU. I doubt the three of us will get to putting Bowtie on the GPU, but if there's some brave soul out there willing to give it a try... nVidia makes cards now that have big enough memories to store the Bowtie index of the human genome.

                Comment

                • Chipper
                  Senior Member
                  • Mar 2008
                  • 323

                  #9
                  Thanks Cole. It would be fun though to see if a set-up like htttp://fastra.ua.ac.be/en/index.html or http://www.asrock.com/news/pop/X58/index.htm could be used for sequence analysis.

                  Comment

                  • Berlinq
                    Junior Member
                    • Dec 2009
                    • 7

                    #10
                    unfortunately CUDA will not work with xen kernel, which uses for instant RHEL5

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    19 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    29 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-26-2026, 10:12 AM
                    0 responses
                    31 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...