I am interested to know expert opinions in regard of pros and cons for use of Nvidia GPU - based and Xeon Phi coprocessor - based architectures for bioinfomatics applications. I realize that not all programs out there can take advantage of parallelization and need to be redesigned with help of significant programming efforts, yet if I have a choice of acquiring a dedicated server utilizing either of these platforms, what would be a better investment in regard of computing efficiency and perspectives?
Unconfigured Ad
Collapse
X
-
We discussed this a little while back and I think it's really a question of what application you're trying to accelerate, and whether that task has been something that has had effort invested to apply GPU or Phi resources. One of the issues is that quite a few GPU accelerated projects haven't been particularly well maintained. Admittedly the Phi can run x86 code without modification (supposedly), but the performance boost is kind of an unknown for us.Originally posted by yaximik View PostI am interested to know expert opinions in regard of pros and cons for use of Nvidia GPU - based and Xeon Phi coprocessor - based architectures for bioinfomatics applications. I realize that not all programs out there can take advantage of parallelization and need to be redesigned with help of significant programming efforts, yet if I have a choice of acquiring a dedicated server utilizing either of these platforms, what would be a better investment in regard of computing efficiency and perspectives?
-
-
I posted a question in general, to get a broad opinion, although I realize that answer is much dependent on particular applications and needs. For exampe, right now I am running blastx from the Blast+ package on my dataset. On a grid utilizing on average 400-500 threads this has been running nonstop 2 months alerady and processed so far about 1/2 of the dataset. So this is obviously one candidate for more parallelization. Old-fashioned de novo assembly is another one, as available assemblers that use de Bruijn graphs so far produced dismal results, although I cannot admit I explored all options.
But my question was in a generic sense as to whether advantages and disadvantages of both platforms cam be compared. I found some generic comparisons elsewhere, but without specifics that are characteristic for bioinformatics tasks, so I thought it might be more productive to seek answers here.
Comment
-
-
I'm curious, what are you blasting, and against what? 2 months seems an awful long time to blast something. Also, why blastx? Wouldn't it be a lot faster to first predict proteins with your algorithm of choosing (I like FragGeneScan) and then blastp against a protein db (would also make more sense biologically since afaik your can't do multiple genetic codes with blastx at once)? Have you parallelized your blast properly? The num_threads option alone is a very poor solution. As a benchmark, blastp of some 2.5 million proteins against nr took me about 2 days on our cluster (I think 18 nodes with 16 Xeon cores and 512 GB RAM in each node and 2 nodes with 32 Xeon cores and 768 GB RAM each), however, I wasn't the only one using it. I parallelized the blasts by splitting input sequences and then calling an array of blasts in SGE with 8 threads in each blastp instance (at max I think I had maybe 300 simultaneous threads going)..Originally posted by yaximik View PostFor exampe, right now I am running blastx from the Blast+ package on my dataset. On a grid utilizing on average 400-500 threads this has been running nonstop 2 months alerady and processed so far about 1/2 of the dataset.Last edited by rhinoceros; 06-29-2013, 06:07 AM.savetherhino.org
Comment
-
-
Perhaps this blastx is for metagenomics projects? In that case, have you tried to assemble reads/find long ORFs and deredundant the proteins, or to use established analysis methods/pipelines?
I also wonder why you consider de novo assemblies are "dismal" and how you think using GPU/Phi may improve the current situation.Last edited by lh3; 06-29-2013, 10:38 AM.
Comment
-
-
I have something 200 million MiSeq reads now in a dozen or so files that are blastx' ed individually in 6 frames each against nr. I split each file in 500 chunks and go with an SGE array using 8-12 threads for each. on average it takes 4-6 days to complete one array job of 500 chunks. On the avearge, I can get 300-500 threads allocated on the grid for each array job. But this is just one iteration, so it is going to be a very long haul in a long run.
I did not know about FragGeneScan option, so I just use blastx. Is it better? The major issue is that I cannot use any reference. I tried to use the human genome, but got about 80% of the dataset filtered out due to lack of significant match. Since it is an archeological specimen, a lot of sequences are expected to be bacterial/fungal contamination, but that is manageable.
I tried to get de novo assembly using a few tools like Ray and got the longest contig of about 40 kb and a lot of shorter contigs, yet blastn' or blastx'ing did not really work as after about week the program crashed. Too long waiting for such result and splitting datasets with long contigs is much more problematic. So I resorted to analysis of individual reads wit the idea to anayse first the metagenomic content of each individual run from the dataset. Then I can remove obviously contaminating sequences (bacterial/fungal), then see what I can do with the rest.
Comment
-
-
With blastx, you select a genetic code (default = 1, I think), so for example UGA will signal termination of translation. However, in many genetic codes, UGA = Trp. So especially in metagenomic studies (and everything related to mitochondria), you should always predict proteins first with some algorithm that takes this kind of things into account, and only then do blasts..
Did you dereplicate your reads prior to blasting? This might/probably would reduce their number significantly.Last edited by rhinoceros; 06-29-2013, 04:36 PM.savetherhino.org
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 06-05-2026, 10:09 AM
|
0 responses
12 views
0 reactions
|
Last Post
by SEQadmin2
06-05-2026, 10:09 AM
|
||
|
Started by SEQadmin2, 06-04-2026, 08:59 AM
|
0 responses
23 views
0 reactions
|
Last Post
by SEQadmin2
06-04-2026, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
28 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
22 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
Comment