Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Opinions needed: Phi vs GPU in bioinfomatics

    I am interested to know expert opinions in regard of pros and cons for use of Nvidia GPU - based and Xeon Phi coprocessor - based architectures for bioinfomatics applications. I realize that not all programs out there can take advantage of parallelization and need to be redesigned with help of significant programming efforts, yet if I have a choice of acquiring a dedicated server utilizing either of these platforms, what would be a better investment in regard of computing efficiency and perspectives?

  • #2
    Originally posted by yaximik View Post
    I am interested to know expert opinions in regard of pros and cons for use of Nvidia GPU - based and Xeon Phi coprocessor - based architectures for bioinfomatics applications. I realize that not all programs out there can take advantage of parallelization and need to be redesigned with help of significant programming efforts, yet if I have a choice of acquiring a dedicated server utilizing either of these platforms, what would be a better investment in regard of computing efficiency and perspectives?
    We discussed this a little while back and I think it's really a question of what application you're trying to accelerate, and whether that task has been something that has had effort invested to apply GPU or Phi resources. One of the issues is that quite a few GPU accelerated projects haven't been particularly well maintained. Admittedly the Phi can run x86 code without modification (supposedly), but the performance boost is kind of an unknown for us.

    Comment


    • #3
      I posted a question in general, to get a broad opinion, although I realize that answer is much dependent on particular applications and needs. For exampe, right now I am running blastx from the Blast+ package on my dataset. On a grid utilizing on average 400-500 threads this has been running nonstop 2 months alerady and processed so far about 1/2 of the dataset. So this is obviously one candidate for more parallelization. Old-fashioned de novo assembly is another one, as available assemblers that use de Bruijn graphs so far produced dismal results, although I cannot admit I explored all options.
      But my question was in a generic sense as to whether advantages and disadvantages of both platforms cam be compared. I found some generic comparisons elsewhere, but without specifics that are characteristic for bioinformatics tasks, so I thought it might be more productive to seek answers here.

      Comment


      • #4
        Originally posted by yaximik View Post
        For exampe, right now I am running blastx from the Blast+ package on my dataset. On a grid utilizing on average 400-500 threads this has been running nonstop 2 months alerady and processed so far about 1/2 of the dataset.
        I'm curious, what are you blasting, and against what? 2 months seems an awful long time to blast something. Also, why blastx? Wouldn't it be a lot faster to first predict proteins with your algorithm of choosing (I like FragGeneScan) and then blastp against a protein db (would also make more sense biologically since afaik your can't do multiple genetic codes with blastx at once)? Have you parallelized your blast properly? The num_threads option alone is a very poor solution. As a benchmark, blastp of some 2.5 million proteins against nr took me about 2 days on our cluster (I think 18 nodes with 16 Xeon cores and 512 GB RAM in each node and 2 nodes with 32 Xeon cores and 768 GB RAM each), however, I wasn't the only one using it. I parallelized the blasts by splitting input sequences and then calling an array of blasts in SGE with 8 threads in each blastp instance (at max I think I had maybe 300 simultaneous threads going)..
        Last edited by rhinoceros; 06-29-2013, 06:07 AM.
        savetherhino.org

        Comment


        • #5
          Perhaps this blastx is for metagenomics projects? In that case, have you tried to assemble reads/find long ORFs and deredundant the proteins, or to use established analysis methods/pipelines?

          I also wonder why you consider de novo assemblies are "dismal" and how you think using GPU/Phi may improve the current situation.
          Last edited by lh3; 06-29-2013, 10:38 AM.

          Comment


          • #6
            I have something 200 million MiSeq reads now in a dozen or so files that are blastx' ed individually in 6 frames each against nr. I split each file in 500 chunks and go with an SGE array using 8-12 threads for each. on average it takes 4-6 days to complete one array job of 500 chunks. On the avearge, I can get 300-500 threads allocated on the grid for each array job. But this is just one iteration, so it is going to be a very long haul in a long run.

            I did not know about FragGeneScan option, so I just use blastx. Is it better? The major issue is that I cannot use any reference. I tried to use the human genome, but got about 80% of the dataset filtered out due to lack of significant match. Since it is an archeological specimen, a lot of sequences are expected to be bacterial/fungal contamination, but that is manageable.

            I tried to get de novo assembly using a few tools like Ray and got the longest contig of about 40 kb and a lot of shorter contigs, yet blastn' or blastx'ing did not really work as after about week the program crashed. Too long waiting for such result and splitting datasets with long contigs is much more problematic. So I resorted to analysis of individual reads wit the idea to anayse first the metagenomic content of each individual run from the dataset. Then I can remove obviously contaminating sequences (bacterial/fungal), then see what I can do with the rest.

            Comment


            • #7
              With blastx, you select a genetic code (default = 1, I think), so for example UGA will signal termination of translation. However, in many genetic codes, UGA = Trp. So especially in metagenomic studies (and everything related to mitochondria), you should always predict proteins first with some algorithm that takes this kind of things into account, and only then do blasts..

              Did you dereplicate your reads prior to blasting? This might/probably would reduce their number significantly.
              Last edited by rhinoceros; 06-29-2013, 04:36 PM.
              savetherhino.org

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Exploring the Dynamics of the Tumor Microenvironment
                by seqadmin




                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                07-08-2024, 03:19 PM
              • seqadmin
                Exploring Human Diversity Through Large-Scale Omics
                by seqadmin


                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                06-25-2024, 06:43 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 07-19-2024, 07:20 AM
              0 responses
              40 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-16-2024, 05:49 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-15-2024, 06:53 AM
              0 responses
              63 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-10-2024, 07:30 AM
              0 responses
              43 views
              0 likes
              Last Post seqadmin  
              Working...
              X