Unconfigured Ad

**westerman** · 06-12-2009, 12:08 PM

No good feedback here but I concur:

If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing.

We have only one SOLiD and thus do not have the problems that griznog has. Never-the-less I find the lack of automation irritating as well as the lack of scalability.

**OneManArmy** · 07-19-2009, 04:17 PM

Unfortunately not much feedback here either, but I am interested in how you connect these machines together.

Originally posted by griznog View Post

We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes.

What is the network speed of the connection you use to connect your SOLiDs to this central NFS server? The images are acquired in Windows, so do yours write to a samba share on the onboard cluster which maps to a NFS mount?

Unfortunately at the moment the network speeds available at our site makes dumping the images directly to our data centre via NFS unfeasible.

**griznog** · 07-19-2009, 04:51 PM

Originally posted by OneManArmy View Post

What is the network speed of the connection you use to connect your SOLiDs to this central NFS server? The images are acquired in Windows, so do yours write to a samba share on the onboard cluster which maps to a NFS mount?

Unfortunately at the moment the network speeds available at our site makes dumping the images directly to our data centre via NFS unfeasible.

Each SOLiD has a 1 gig uplink to an aggregate switch, which then has 10gbps connection to storage (via about 3 switch hops and one router hop). It's not ideal for latency, but performance seems reasonable and in simple benchmarks the central storage was at least as good as the head node storage for single clients and vastly better for multiple clients. Note that we were only using this for results. I have used it for images on one instrument when a failure in the MD1000 left us without an images directory for a few days but don't consider that a good test of central images storage because of the short duration of usage.

Since posting this thread we've had some very good interaction with ABI and the roadmap for v3.5 of the instrument software appears to address many of our issues so once we upgrade to 3.5 later this year we'll revert back to the export model rather than using central NFS. Given the roadmap shown to us, I would withhold recommending central NFS storage until we've seen how well the new software handles exporting.

griznog

**nilshomer** · 07-19-2009, 05:39 PM

Originally posted by griznog View Post

Each SOLiD has a 1 gig uplink to an aggregate switch, which then has 10gbps connection to storage (via about 3 switch hops and one router hop). It's not ideal for latency, but performance seems reasonable and in simple benchmarks the central storage was at least as good as the head node storage for single clients and vastly better for multiple clients. Note that we were only using this for results. I have used it for images on one instrument when a failure in the MD1000 left us without an images directory for a few days but don't consider that a good test of central images storage because of the short duration of usage.

Since posting this thread we've had some very good interaction with ABI and the roadmap for v3.5 of the instrument software appears to address many of our issues so once we upgrade to 3.5 later this year we'll revert back to the export model rather than using central NFS. Given the roadmap shown to us, I would withhold recommending central NFS storage until we've seen how well the new software handles exporting.

griznog

We rely on copying the primary data (after color calling) over to NFS volumes, which allows us to have lost of cheap storage. The most current runs are then stored on a fast distributed file system (lustre) while alignment, variant calling, structural variants, and all other downstream analysis is completed. We then copy back all the results and intermediate files that need to be archived to the NFS servers. A lot of this is human automated, whereby a human has to initiate the transfer, the secondary analysis, and the final archiving.

I would love to hear any successes with using some type of workflow system (Kepler etc.) in automating not only SOLiDs but also other NGS technology, since the big problem for us is having a mix of technologies (and workflows/applications) that are constantly being developed/updated.

**pssclabs** · 10-08-2009, 08:31 PM

This is somewhat related to the above. I am with PSSC Labs (www.pssclabs.com). We are working to develop a SOLiD Offline Cluster. All of the information provided above is great. It gives me a much better understanding of the computing needs of the cluster than any of my discussions with AB.

I had a few questions. Do any of you have experience running any AB developed application over Infiniband or other high speed network interconnects?

Is there a maximum number of cores where the AB software will no longer scale? Or the performance gain of adding more nodes is negligible?

Thank you

**westerman** · 10-13-2009, 01:03 PM

Originally posted by pssclabs View Post

Is there a maximum number of cores where the AB software will no longer scale?

There are a handful of ABI software packages out there -- e.g., Mapping, SNP calling, Transcriptome -- which often stand alone although they may be sharing programs.

If we consider the first program -- Mapping -- then there is a maximum number of cores. Basically the mapping program is broken down into 6 sub-programs:

1) Map the read file to each chromosome. The natural core limit on this is the number of chromosomes.

2) Collect the map information into one overall file -- limit of 1 core.

3) Do a per-chromosome re-mapping for the optimal matches.

4-6) Gather back the mapping into one overall file with statistics and an index.

Overall rather inefficient. Some of the other ABI programs do seem to take into account the number of cores. Also one could see a way to split the read file into parts and map those parts against the chromosomes.

New AB software due out "soon". Maybe it will be more efficient.

**KevinLam** · 11-26-2009, 01:52 AM

Interesting info! especially the NFS bit.

How about cost-effective solutions to analysis?
I am trying to build an offline cluster with the minimum specs to do the analysis. I am thinking not all labs would have the budget for a cluster computer that just collects dust when they are done with the analysis.

What's the lowest spec machine that a Solid User has managed to get away with?
Anyone did any benchmarking?

**westerman** · 11-30-2009, 09:08 AM

Originally posted by KevinLam View Post

Interesting info! especially the NFS bit.

How about cost-effective solutions to analysis?
I am trying to build an offline cluster with the minimum specs to do the analysis. I am thinking not all labs would have the budget for a cluster computer that just collects dust when they are done with the analysis.

What's the lowest spec machine that a Solid User has managed to get away with?
Anyone did any benchmarking?

I doubt if anyone will bother benchmarking the lowest machine since such a task would be boring and, IMHO, not much use. Basically just grab a x86-64 based computer with 12 GB of memory and 500 GB of disk space. About $2500 from Dell. That would work. Might be slow. Might run out of disk space eventually. But if you want low-ball then the above should be ok.

Or if you want high-ball then share $100,000+ machines with other people. This is what we do.

Seriously, you really should set a budget and then buy within that. That is generally the best bet when purchasing computer equipment.

**KevinLam** · 11-30-2009, 11:52 PM

Originally posted by westerman View Post

I doubt if anyone will bother benchmarking the lowest machine since such a task would be boring and, IMHO, not much use. Basically just grab a x86-64 based computer with 12 GB of memory and 500 GB of disk space. About $2500 from Dell. That would work. Might be slow. Might run out of disk space eventually. But if you want low-ball then the above should be ok.

Or if you want high-ball then share $100,000+ machines with other people. This is what we do.

Seriously, you really should set a budget and then buy within that. That is generally the best bet when purchasing computer equipment.

Actually i think benchmarking cost effective machines can be very exciting!
often times when you have a super HPC you think less about algo speedups

anyway I managed to find this desktop benchmark for de novo assembly by CLCBIO

Database not available

http://www.clcngs.com/2009/11/new-benchmarks-of-our-upcoming-de-novo-assembler/

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 45 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 105 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

SOLiD from an IT perspective

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News