Seqanswers Leaderboard Ad

**jtietjen** · 08-11-2012, 04:53 PM

I realized after posting that people might begin to point out that other threads exist on specific NGS analysis algorithms for parallelization, but I decided to leave my thread very open ended because in the end, the system I have in mind should work for any and all current analysis/data processing methods.

**xied75** · 08-13-2012, 02:25 AM

NGS mostly are text processing (doesn't matter if binary or compressed), so I/O is the bottleneck (no matter in house or to the Internet).

SETI (or maybe Folding@Home), a small data file will make CPU happy for a while.

Cloud (Amazon or whatever), is a business model that buy large amount of white box servers and rent out in 1 hour unit, it does not use fancy hardware, it does not upgrade until the previous investment is back.

So today's situation is like this:
1, for a 4TB harddrive, you can only get 100MB/s sequential read out of it.
2, you might have a PB sized array in house, but you only have 1Gb Internet connection to the world.
3, this won't change for some years.
4, LHC's infrastructure, is the extreme/limit for now, anything they can't do/afford, no one can.

**ymc** · 08-13-2012, 08:50 AM

1. This can change now if you have $$$

2. For eight SSDs in RAID0, you can get 2500MB/s sequential read
3. InfiniBand for 300Gbps network

**xied75** · 08-13-2012, 12:24 PM

Originally posted by ymc View Post

2. For eight SSDs in RAID0, you can get 2500MB/s sequential read

No no that's not my point. I would rather say you can get 2500MB/s random read (maybe, I don't have these to play with.)

Originally posted by ymc View Post

3. InfiniBand for 300Gbps network

No no again, I was talking about Internet connection, the thread is asking about Cloud, (unless Private Cloud is also included in the discussion.)

**kevyin** · 09-12-2012, 06:23 PM

There are links here on deploying galaxy in a cluster (and other things)

Page Not Found - Galaxy Community Hub

http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Production%20Server?action=show&redirect=Admin%2FConfig%2FPerformance

All about Galaxy and its community.

We have this deployed on our cluster and jobs are basically distributed to cluster nodes by the Sun Grid Engine.

It's up to the tools themselves to do MPI/threading etc.

In a cloud setting, NGS data can get quite large so storage may be an issue

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Extreme parallelization for NGS analysis

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News