Unconfigured Ad

**Lee Sam** · 06-28-2013, 11:20 AM

Originally posted by yaximik View Post

I am interested to know expert opinions in regard of pros and cons for use of Nvidia GPU - based and Xeon Phi coprocessor - based architectures for bioinfomatics applications. I realize that not all programs out there can take advantage of parallelization and need to be redesigned with help of significant programming efforts, yet if I have a choice of acquiring a dedicated server utilizing either of these platforms, what would be a better investment in regard of computing efficiency and perspectives?

We discussed this a little while back and I think it's really a question of what application you're trying to accelerate, and whether that task has been something that has had effort invested to apply GPU or Phi resources. One of the issues is that quite a few GPU accelerated projects haven't been particularly well maintained. Admittedly the Phi can run x86 code without modification (supposedly), but the performance boost is kind of an unknown for us.

**yaximik** · 06-29-2013, 04:59 AM

I posted a question in general, to get a broad opinion, although I realize that answer is much dependent on particular applications and needs. For exampe, right now I am running blastx from the Blast+ package on my dataset. On a grid utilizing on average 400-500 threads this has been running nonstop 2 months alerady and processed so far about 1/2 of the dataset. So this is obviously one candidate for more parallelization. Old-fashioned de novo assembly is another one, as available assemblers that use de Bruijn graphs so far produced dismal results, although I cannot admit I explored all options.
But my question was in a generic sense as to whether advantages and disadvantages of both platforms cam be compared. I found some generic comparisons elsewhere, but without specifics that are characteristic for bioinformatics tasks, so I thought it might be more productive to seek answers here.

**rhinoceros** · 06-29-2013, 05:27 AM

Originally posted by yaximik View Post

For exampe, right now I am running blastx from the Blast+ package on my dataset. On a grid utilizing on average 400-500 threads this has been running nonstop 2 months alerady and processed so far about 1/2 of the dataset.

I'm curious, what are you blasting, and against what? 2 months seems an awful long time to blast something. Also, why blastx? Wouldn't it be a lot faster to first predict proteins with your algorithm of choosing (I like FragGeneScan) and then blastp against a protein db (would also make more sense biologically since afaik your can't do multiple genetic codes with blastx at once)? Have you parallelized your blast properly? The num_threads option alone is a very poor solution. As a benchmark, blastp of some 2.5 million proteins against nr took me about 2 days on our cluster (I think 18 nodes with 16 Xeon cores and 512 GB RAM in each node and 2 nodes with 32 Xeon cores and 768 GB RAM each), however, I wasn't the only one using it. I parallelized the blasts by splitting input sequences and then calling an array of blasts in SGE with 8 threads in each blastp instance (at max I think I had maybe 300 simultaneous threads going)..

**lh3** · 06-29-2013, 10:35 AM

Perhaps this blastx is for metagenomics projects? In that case, have you tried to assemble reads/find long ORFs and deredundant the proteins, or to use established analysis methods/pipelines?

I also wonder why you consider de novo assemblies are "dismal" and how you think using GPU/Phi may improve the current situation.

**yaximik** · 06-29-2013, 12:46 PM

I have something 200 million MiSeq reads now in a dozen or so files that are blastx' ed individually in 6 frames each against nr. I split each file in 500 chunks and go with an SGE array using 8-12 threads for each. on average it takes 4-6 days to complete one array job of 500 chunks. On the avearge, I can get 300-500 threads allocated on the grid for each array job. But this is just one iteration, so it is going to be a very long haul in a long run.

I did not know about FragGeneScan option, so I just use blastx. Is it better? The major issue is that I cannot use any reference. I tried to use the human genome, but got about 80% of the dataset filtered out due to lack of significant match. Since it is an archeological specimen, a lot of sequences are expected to be bacterial/fungal contamination, but that is manageable.

I tried to get de novo assembly using a few tools like Ray and got the longest contig of about 40 kb and a lot of shorter contigs, yet blastn' or blastx'ing did not really work as after about week the program crashed. Too long waiting for such result and splitting datasets with long contigs is much more problematic. So I resorted to analysis of individual reads wit the idea to anayse first the metagenomic content of each individual run from the dataset. Then I can remove obviously contaminating sequences (bacterial/fungal), then see what I can do with the rest.

**rhinoceros** · 06-29-2013, 01:28 PM

With blastx, you select a genetic code (default = 1, I think), so for example UGA will signal termination of translation. However, in many genetic codes, UGA = Trp. So especially in metagenomic studies (and everything related to mitochondria), you should always predict proteins first with some algorithm that takes this kind of things into account, and only then do blasts..

Did you dereplicate your reads prior to blasting? This might/probably would reduce their number significantly.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Opinions needed: Phi vs GPU in bioinfomatics

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News