Unconfigured Ad

**Bukowski** · 03-15-2019, 04:28 AM

I wouldn't have thought so. You require all the reads to assemble the genome, so splitting this across a cluster, without a shared/distributed memory model, doesn't fit the assembly paradigm which is why most people use a big box with lots of RAM.

See:

https://ieeexplore.ieee.org/document/6165266

Page not found – High Performance Computing

https://www.hpc.informatik.uni-mainz.de/de-novo-genome-assembly/

**sanderson83** · 03-26-2019, 05:42 AM

Hi Bukowski,

Thanks for your reply.

If we were to cluster the machines and apply a shared/distributed memory model would I likely see an increase in processing speeds due to higher memory/available cores?

Sorry if this is a naive question but I need to find a way of increasing throughput if at all possible. Appreciate the advice.

**Bukowski** · 03-26-2019, 09:09 AM

It sounds like your best bet is just doing things in an embarrassingly parallel manner which is what you're currently doing. I may have misinterpreted your original request, though but the short answer is no.

If you build a cluster, you get a job scheduler, and the best thing about that is that you stop having to worry about manually managing the jobs - when one finishes on one machine, it just starts the next one in the queue - that's the benefit for you building a cluster of your machines.

I also didn't spot you were using Trinity, so I'm going to assume that you're doing transcriptome assemblies - Trinity is already using the resources efficiently in the machine, so the run time you see, is just the run time. Providing it's not maxing out the memory, it matters not a jot if your CPU utilisation is high - all you care about in terms of performance is that it's not swapping out to disk.

Your process is CPU bound not memory bound. The only benefit you would gain from a cluster with a shared memory architecture doesn't solve your apparent issue, which isn't to do with RAM.

https://github.com/trinityrnaseq/tri...g-Requirements suggests you need 256GB of RAM in a machine - but I don't know what organism you're working on or how many reads you have in a sample.

You might want to look at end of run profiling:

Trinity Runtime Profiling

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Runtime-Profiling

Trinity RNA-Seq de novo transcriptome assembly. Contribute to trinityrnaseq/trinityrnaseq development by creating an account on GitHub.

This might give you more of an idea where the bottleneck is.

**sanderson83** · 03-27-2019, 06:16 AM

Perfect.

Thanks for the comprehensive and helpful response. Stops me wasting any more time looking into this.

Thanks,
Sanderson.

Topics	Statistics	Last Post
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 32 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM

Unconfigured Ad

Denovo assembly system resources

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News