Header Leaderboard Ad

Collapse

Performance improvements

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance improvements

    Hi,

    I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

    So, any suggestion is greatly appreciated

    Thanks!

  • #2
    Originally posted by perencia View Post
    Hi,

    I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

    So, any suggestion is greatly appreciated

    Thanks!
    I believe it's quite the opposite... the level of parallelism is quite low (and it has been increasing only in the last year)...
    BTW, take a look at this:

    http://savannah.gnu.org/projects/parallel

    d

    Comment


    • #3
      Thanks dawe,

      Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.

      Comment


      • #4
        Originally posted by perencia View Post
        Thanks dawe,

        Actually i was thinking more at a source level, MPI cluster approach, in the line of mpiblast http://www.mpiblast.org/ or OpenMP, for shared memory.
        I think improvements are (or should be) very welcome, although patches are not easily merged to mainstream code, especially when there's no team behind an application and there's a single developer instead...
        I've written myself a patch to add OpenMP support to clover, but it has never been accepted :-(
        Also, consider that most of the code is written by bio-experts and sometimes it may be hard to parallelize it, mainly for two reasons:
        1- poorly commented code
        2- obscure blocks which may need code refactoring...

        BTW, there are examples of optimization for NGS, take a look to mummer-gpu (by the same authors of bowtie and tophat).

        d

        Comment


        • #5
          Originally posted by perencia View Post
          Hi,

          I was looking for any open-source tool on bioinformatics algorithms to make performance improvements on it - I was thinking on cluster parallelization but I accept suggestions, multi-core utilization, threads, etc.. -. As a newcomer to bioinformatics, i've seen that many of the tools already have some kind of parallelization, but i don't known if there's something worth to improve and what are the projects which would benefit more from that... actually, i'm a little bit lost

          So, any suggestion is greatly appreciated

          Thanks!
          Well, if your are looking for a problem - I would say if someone could come up with a parallelization of multiple sequence realignment of the reads (either CPU or GPU based), that would be an opportunity. The idea of going back to realign reads to form a consensus around an indel is relatively new compared to the regular read-to-reference mapping problem, which is already saturated. Of course multiple sequence alignment is an old problem, so the algorithms are out there to do it - just the parallelization is needed.

          Right now the two leading programs GATK and SRMA:

          http://www.broadinstitute.org/gsa/wi..._around_indels

          http://sourceforge.net/apps/mediawik...itle=Main_Page

          Comment


          • #6
            In the area of alignment, there are not so many parallelization problems. You can just split your read set in alignment. Realignment can be done region by region on an indexed BAM. You need scripts to automate the process, but this is of little academic interest.

            In addition, if you write your programs for big sequencing centers, you should not use MPI when this can be avoided such as for alignment. For assembly, MPI is a reasonable solution as you can hardly split the read set. GPU is even worse. You cannot expect we put an expensive GPU on every computer purely for the purpose of one or two programs.

            I see SSE as a more reasonable solution to parallelization, although it is algorithm dependent (this is also true for GPU). In addition, if you are mainly interested in research rather than practical applications, you may also try MPI/GPU, just few will use your product.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
              by seqadmin


              ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

              01-24-2023, 01:19 PM
            • seqadmin
              Introduction to Single-Cell Sequencing
              by seqadmin
              Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

              The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
              ...
              01-09-2023, 03:10 PM

            ad_right_rmr

            Collapse
            Working...
            X