Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • griznog
    Junior Member
    • Apr 2009
    • 2

    SOLiD from an IT perspective

    Hello,

    I'm a system admin for a handful of SOLiDs and I'm curious to know what other people are doing as far as IT related issues with SOLiD. We've tried a few novel things in our environment with some success, but it'd be nice to hear from others who are managing these.

    We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes. A side effect is that our instruments can access results for any run they have ever performed (with the noted exception that run data in the instrument database was not preserved during the 2 -> 3 upgrade) whether that is good or bad remains to be determined. This also allows us to recover the results disk space and enlarge the images volume on the instrument, which is nice.

    Secondary analysis on instrument has been a challenge. We've made attempts at using Global SETS to move it off instrument with little success. We've played with increasing the core count on an instrument by adding nodes to an instrument (via a VLAN to our data center) and that seems promising (a 35x2 run against full human genome completes in ~ 5 days with 104 cores.) All real analysis has been done on our existing compute farm and infrastructure using Corona Lite.

    We've considered using the VLAN approach to move all compute nodes off instrument to help address heat issues in the lab where these reside.

    Any feedback would be appreciated. We are doing things in a non-standard way in an attempt to make the instruments more manageable. It'd be nice if an instrument could notify an external service when primary analysis was complete, for instance. If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing.

    Thanks,

    griznog
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    No good feedback here but I concur:

    If anyone else has had luck making SOLiD more automated, manageable and scalable I'd love to hear what you are doing.
    We have only one SOLiD and thus do not have the problems that griznog has. Never-the-less I find the lack of automation irritating as well as the lack of scalability.

    Comment

    • OneManArmy
      Member
      • Jul 2009
      • 13

      #3
      Unfortunately not much feedback here either, but I am interested in how you connect these machines together.

      Originally posted by griznog View Post
      We currently have all our machines write directly to a central NFS server (a high performance clustered storage solution) rather that copying data after primary analysis completes.
      What is the network speed of the connection you use to connect your SOLiDs to this central NFS server? The images are acquired in Windows, so do yours write to a samba share on the onboard cluster which maps to a NFS mount?

      Unfortunately at the moment the network speeds available at our site makes dumping the images directly to our data centre via NFS unfeasible.

      Comment

      • griznog
        Junior Member
        • Apr 2009
        • 2

        #4
        Originally posted by OneManArmy View Post
        What is the network speed of the connection you use to connect your SOLiDs to this central NFS server? The images are acquired in Windows, so do yours write to a samba share on the onboard cluster which maps to a NFS mount?

        Unfortunately at the moment the network speeds available at our site makes dumping the images directly to our data centre via NFS unfeasible.
        Each SOLiD has a 1 gig uplink to an aggregate switch, which then has 10gbps connection to storage (via about 3 switch hops and one router hop). It's not ideal for latency, but performance seems reasonable and in simple benchmarks the central storage was at least as good as the head node storage for single clients and vastly better for multiple clients. Note that we were only using this for results. I have used it for images on one instrument when a failure in the MD1000 left us without an images directory for a few days but don't consider that a good test of central images storage because of the short duration of usage.

        Since posting this thread we've had some very good interaction with ABI and the roadmap for v3.5 of the instrument software appears to address many of our issues so once we upgrade to 3.5 later this year we'll revert back to the export model rather than using central NFS. Given the roadmap shown to us, I would withhold recommending central NFS storage until we've seen how well the new software handles exporting.

        griznog

        Comment

        • nilshomer
          Nils Homer
          • Nov 2008
          • 1283

          #5
          Originally posted by griznog View Post
          Each SOLiD has a 1 gig uplink to an aggregate switch, which then has 10gbps connection to storage (via about 3 switch hops and one router hop). It's not ideal for latency, but performance seems reasonable and in simple benchmarks the central storage was at least as good as the head node storage for single clients and vastly better for multiple clients. Note that we were only using this for results. I have used it for images on one instrument when a failure in the MD1000 left us without an images directory for a few days but don't consider that a good test of central images storage because of the short duration of usage.

          Since posting this thread we've had some very good interaction with ABI and the roadmap for v3.5 of the instrument software appears to address many of our issues so once we upgrade to 3.5 later this year we'll revert back to the export model rather than using central NFS. Given the roadmap shown to us, I would withhold recommending central NFS storage until we've seen how well the new software handles exporting.

          griznog
          We rely on copying the primary data (after color calling) over to NFS volumes, which allows us to have lost of cheap storage. The most current runs are then stored on a fast distributed file system (lustre) while alignment, variant calling, structural variants, and all other downstream analysis is completed. We then copy back all the results and intermediate files that need to be archived to the NFS servers. A lot of this is human automated, whereby a human has to initiate the transfer, the secondary analysis, and the final archiving.

          I would love to hear any successes with using some type of workflow system (Kepler etc.) in automating not only SOLiDs but also other NGS technology, since the big problem for us is having a mix of technologies (and workflows/applications) that are constantly being developed/updated.

          Comment

          • pssclabs
            Junior Member
            • Sep 2009
            • 6

            #6
            This is somewhat related to the above. I am with PSSC Labs (www.pssclabs.com). We are working to develop a SOLiD Offline Cluster. All of the information provided above is great. It gives me a much better understanding of the computing needs of the cluster than any of my discussions with AB.

            I had a few questions. Do any of you have experience running any AB developed application over Infiniband or other high speed network interconnects?

            Is there a maximum number of cores where the AB software will no longer scale? Or the performance gain of adding more nodes is negligible?

            Thank you

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              Originally posted by pssclabs View Post

              Is there a maximum number of cores where the AB software will no longer scale?
              There are a handful of ABI software packages out there -- e.g., Mapping, SNP calling, Transcriptome -- which often stand alone although they may be sharing programs.

              If we consider the first program -- Mapping -- then there is a maximum number of cores. Basically the mapping program is broken down into 6 sub-programs:

              1) Map the read file to each chromosome. The natural core limit on this is the number of chromosomes.

              2) Collect the map information into one overall file -- limit of 1 core.

              3) Do a per-chromosome re-mapping for the optimal matches.

              4-6) Gather back the mapping into one overall file with statistics and an index.

              Overall rather inefficient. Some of the other ABI programs do seem to take into account the number of cores. Also one could see a way to split the read file into parts and map those parts against the chromosomes.

              New AB software due out "soon". Maybe it will be more efficient.

              Comment

              • KevinLam
                Senior Member
                • Nov 2009
                • 204

                #8
                Interesting info! especially the NFS bit.

                How about cost-effective solutions to analysis?
                I am trying to build an offline cluster with the minimum specs to do the analysis. I am thinking not all labs would have the budget for a cluster computer that just collects dust when they are done with the analysis.

                What's the lowest spec machine that a Solid User has managed to get away with?
                Anyone did any benchmarking?
                http://kevin-gattaca.blogspot.com/

                Comment

                • westerman
                  Rick Westerman
                  • Jun 2008
                  • 1104

                  #9
                  Originally posted by KevinLam View Post
                  Interesting info! especially the NFS bit.

                  How about cost-effective solutions to analysis?
                  I am trying to build an offline cluster with the minimum specs to do the analysis. I am thinking not all labs would have the budget for a cluster computer that just collects dust when they are done with the analysis.

                  What's the lowest spec machine that a Solid User has managed to get away with?
                  Anyone did any benchmarking?
                  I doubt if anyone will bother benchmarking the lowest machine since such a task would be boring and, IMHO, not much use. Basically just grab a x86-64 based computer with 12 GB of memory and 500 GB of disk space. About $2500 from Dell. That would work. Might be slow. Might run out of disk space eventually. But if you want low-ball then the above should be ok.

                  Or if you want high-ball then share $100,000+ machines with other people. This is what we do.

                  Seriously, you really should set a budget and then buy within that. That is generally the best bet when purchasing computer equipment.

                  Comment

                  • KevinLam
                    Senior Member
                    • Nov 2009
                    • 204

                    #10
                    Originally posted by westerman View Post
                    I doubt if anyone will bother benchmarking the lowest machine since such a task would be boring and, IMHO, not much use. Basically just grab a x86-64 based computer with 12 GB of memory and 500 GB of disk space. About $2500 from Dell. That would work. Might be slow. Might run out of disk space eventually. But if you want low-ball then the above should be ok.

                    Or if you want high-ball then share $100,000+ machines with other people. This is what we do.

                    Seriously, you really should set a budget and then buy within that. That is generally the best bet when purchasing computer equipment.
                    Actually i think benchmarking cost effective machines can be very exciting!
                    often times when you have a super HPC you think less about algo speedups

                    anyway I managed to find this desktop benchmark for de novo assembly by CLCBIO
                    http://kevin-gattaca.blogspot.com/

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    11 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    45 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    105 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    125 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...