Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mkeehan
    Member
    • Feb 2010
    • 13

    recommendations on disk storage infrastructure

    Hi
    We are planning the IT for our sequencing project. We are expecting to sequence half a dozen dairy bulls (Bos Taurus) at 30 times coverage a year for the next few years. I plan to thoroughly investigate the read mapping and variant calling process so as not too loose too many SNPs for our gene discovery work. So we'll be consuming a couple of Terabyte in reads and alignments per year. I can rent relatively expensive high quality fibre channel SAN disks for our compute cluster from the folks in IT or I can try and buy much more cost effective SATA disks in a Network attached storage box e.g. a teeny tiny isilon, (it is NZ...) or even build it myself with a Supermicro 4U storage chasis, 24 1TB SATA drives and a linux CD. I expect to use either a lot of MosaikAligner or Bowtie.

    Does anyone have any useful recommendations or experiences on:

    a) the value of FibreChannel/SAN disk for I/O performance?
    b) is NFS based storage for gzipped fastq read files good enough with 1 Gb networking i.e. compute node to SAN attached IO node or compute node to NAS node?
    c) Are 7200 rpm, 1 or 2 TB Sata drives a cost effective way to store reads
    for alignments.

    If I don't have to rent expensive SAN disk we can sequence more animals!

    Any thoughts would be appreciated.
  • adamdeluca
    Member
    • Jul 2010
    • 95

    #2
    Originally posted by mkeehan View Post
    a) the value of FibreChannel/SAN disk for I/O performance?
    zero, if you first move files to a local disk before analysis.
    Originally posted by mkeehan View Post
    b) is NFS based storage for gzipped fastq read files good enough with 1 Gb networking i.e. compute node to SAN attached IO node or compute node to NAS node?
    Yes, that will work.
    Originally posted by mkeehan View Post
    c) Are 7200 rpm, 1 or 2 TB Sata drives a cost effective way to store reads
    for alignments.
    sure, do be careful with the 2TB drives as they often have large block sizes and formatting needs to be done differently for optimal performance.

    Now a word of caution, a big part of the expense of the SAN you currently use is in maintenance and backups, not the hard disks. The data you intend to store on this machine is incredibly valuable, make sure you have a plan for backups / disaster recovery.

    Comment

    • colindaven
      Senior Member
      • Oct 2008
      • 417

      #3
      Hi,

      just going through this myself.

      Keep in mind

      *1TB SATA is apparently more efficient than 2TB for RAID striping

      *spatially separated backups are crucial (additional NAS servers come in quite cheap, have RAID, cost < 1000 Euros and don't need a fancy cooled server room).

      *1 GB should be good enough. 10 Gbit seems to be a bit tricky with bad linux drivers, no standards etc. Apparently most 1 GB networks can potentially handle ca. 3 Gbit.

      *I think security is more crucial than performance for these applications, especially if you're happy with bowtie.

      *If you want to use faster SAS hard disks maybe put them in your analysis server rather than the storage solution.

      Hope that helps.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Yesterday, 10:09 AM
      0 responses
      10 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      20 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      27 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      21 views
      0 reactions
      Last Post SEQadmin2  
      Working...