Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance improvements

    Hi,

    I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

    So, any suggestion is greatly appreciated

    Thanks!

  • #2
    Originally posted by perencia View Post
    Hi,

    I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

    So, any suggestion is greatly appreciated

    Thanks!
    I believe it's quite the opposite... the level of parallelism is quite low (and it has been increasing only in the last year)...
    BTW, take a look at this:

    http://savannah.gnu.org/projects/parallel

    d

    Comment


    • #3
      Thanks dawe,

      Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.

      Comment


      • #4
        Originally posted by perencia View Post
        Thanks dawe,

        Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.
        I think improvements are (or should be) very welcome, although patches are not easily merged to mainstream code, especially when there's no team behind an application and there's a single developer instead...
        I've written myself a patch to add OpenMP support to clover, but it has never been accepted :-(
        Also, consider that most of the code is written by bio-experts and sometimes it may be hard to parallelize it, mainly for two reasons:
        1- poorly commented code
        2- obscure blocks which may need code refactoring...

        BTW, there are examples of optimization for NGS, take a look to mummer-gpu (by the same authors of bowtie and tophat).

        d

        Comment


        • #5
          Originally posted by perencia View Post
          Hi,

          I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

          So, any suggestion is greatly appreciated

          Thanks!
          Well, if your are looking for a problem - I would say if someone could come up with a parallelization of multiple sequence realignment of the reads (either CPU or GPU based), that would be an opportunity. The idea of going back to realign reads to form a consensus around an indel is relatively new compared to the regular read-to-reference mapping problem, which is already saturated. Of course multiple sequence alignment is an old problem, so the algorithms are out there to do it - just the parallelization is needed.

          Right now the two leading programs GATK and SRMA:



          Download Short Read Micro re-Aligner for free. SRMA is a post-alignment micro re-aligner for next-generation high throughput sequencing data.

          Comment


          • #6
            In the area of alignment, there are not so many parallelization problems. You can just split your read set in alignment. Realignment can be done region by region on an indexed BAM. You need scripts to automate the process, but this is of little academic interest.

            In addition, if you write your programs for big sequencing centers, you should not use MPI when this can be avoided such as for alignment. For assembly, MPI is a reasonable solution as you can hardly split the read set. GPU is even worse. You cannot expect we put an expensive GPU on every computer purely for the purpose of one or two programs.

            I see SSE as a more reasonable solution to parallelization, although it is algorithm dependent (this is also true for GPU). In addition, if you are mainly interested in research rather than practical applications, you may also try MPI/GPU, just few will use your product.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            66 views
            0 likes
            Last Post seqadmin  
            Working...
            X