Header Leaderboard Ad

Collapse

Computer Hardware: CPU vs. Memory

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Markiyan
    replied
    Storage perfomance notes.

    PS:
    SSD drives HATE random writing in small blocks (they want to write in 64-128-256) KB blocks, but are OK random reads.
    Harddisks struggle with random reads/writes, and also with simultanious reading/writing in multiple threads.
    Raid5 is TERRIBLE for random writing. (4-5 times slower than RAID10).
    My suggestion for the high perfomance system:
    (each group is on separate physical HDD or raid array)
    [System+Software]
    [SWAP]
    [SCRATCH]
    [Input data]
    [output data]
    or at least:
    [System+Software]
    [SWAP+SCRATCH]
    [all data]
    Remember, that having input/output data on the same RAID0 array is always slower, than having them on separate disks w/o raid
    (in memory constrained situation).
    If working with tons of small files (phd_dir) order them by the inode number, and then read them sequentially - will be a lot faster (10-20X) than random IO on the same HDD.
    Use symlinks to faciliate data processing/organisation.
    If you want to use RAID - use RAID10 on the GOOD controller (Adaptec) with at least .25-1GB of the onbord cache and it's own I/O CPU. The perfomance gains with cheap onboard controllers (w/o cache) are often negative... so use separate disks, if can't afford proper RAID.

    Leave a comment:


  • jwfoley
    replied
    Originally posted by KevinLam View Post
    AGREED.
    I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
    but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
    As others noted, this is probably done at the OS level (and the OS is probably Linux if you're building a server this powerful, so it should be easy), but an SSD is way slower than RAM.

    Just remember that even the least efficient software is designed to work on someone's machine, so there's an upper limit on how much RAM you'll ever need to be able to run a program. That limit might be on the order of tens of gigabytes (I've heard 40-50 for certain well-known pipelines). But I don't think there's a reason to complement that with SSDs, because they're definitely not going to buy you any additional speed as virtual memory.

    Leave a comment:


  • Torst
    replied
    Originally posted by KevinLam View Post
    Well SSDs vary in speeds as well and while you have a point about SATA HDD RAID.
    You can easily have SATA SSD RAID.
    4 x SSD would have ~ 1.2 GB/s by your numbers
    Yes you can have SSD RAID of course, and there are plenty of people with Enterprise budgets to do so - but I can't afford it!

    The other issue is that 4xSSD RAID0 = 1.2 GB/s = 9.6 Gbit/sec. Even SATA3 is only 6.0 Gbit/sec, so you have to start investing in more expensive interconnects like 10GigE, multiple FC, etc. And have a PCIe bus and CPU<->BUS connection that can cope too!

    My point is that it is still long way away from RAM throughput and latency (SSD = micro/milli seconds, RAM = nanoseconds).

    Leave a comment:


  • KevinLam
    replied
    Originally posted by Torst View Post
    An SSD is only marginally faster than a HDD when compared to RAM. A good RAID array of HDDs still beats a single SSD too (for throughput, not latency though).

    HDD ~ 75 MB/s
    SSD ~ 300 MB/s
    RAM ~ 10000 MB/s (!)



    Just use your SSD as your virtual memory / swap disk?

    Some software is now being intelligently written to exploit RAM/HDD tradeoff, for example this read mapper: Syzygy
    Well SSDs vary in speeds as well and while you have a point about SATA HDD RAID.
    You can easily have SATA SSD RAID.
    4 x SSD would have ~ 1200 MB/s by your numbers
    only 8.33x slower than RAM!

    btw your url is not formatted properly went to some weird site
    http://www.nicta.com.au/research/res...Q0MDAzJmFsbD0x
    Last edited by KevinLam; 08-27-2010, 12:27 AM.

    Leave a comment:


  • Torst
    replied
    Originally posted by KevinLam View Post
    AGREED.
    I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
    An SSD is only marginally faster than a HDD when compared to RAM. A good RAID array of HDDs still beats a single SSD too (for throughput, not latency though).

    HDD ~ 75 MB/s
    SSD ~ 300 MB/s
    RAM ~ 10000 MB/s (!)

    but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
    Just use your SSD as your virtual memory / swap disk?

    Some software is now being intelligently written to exploit RAM/HDD tradeoff, for example this read mapper: Syzygy

    Leave a comment:


  • KevinLam
    replied
    Originally posted by jwfoley View Post
    Definitely more memory. A lot of people are writing terrible code that wastes tons of memory, and it's better to run programs slowly than not be able to run them at all.
    AGREED.
    I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
    but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.

    Leave a comment:


  • adamdeluca
    replied
    Look at the motherboard, because what you want is expandability. Boards for the AMD 6100 typically come with 1, 2 or 4 processor sockets and have 8, 16 or 32 memory slots respectively. Processor slots need to be populated with identical cpus (not all need to be filled), and memory slots should be populated in groups of 4 identical sticks.

    Your to 24GB is likely a 1P board with 4x4GB + 4x2GB, and thus would fill all of your cpu and memory slots (no expandability without throwing away components).

    The 16GB configuration would likely be a 2p board with 8x2GB or 4x4GB and thus would leave 8 or 12 open memory slots (room to grow).

    Leave a comment:


  • Torst
    replied
    Originally posted by DZhang View Post
    Hi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq. I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case.
    De novo assembly needs more RAM, while re-sequencing (read mapping) and RNA-seq (read mapping + analysis) require less RAM and more CPU.

    Frankly, the difference between 16GB and 24GB RAM is not that much, and won't help with de novo too much. More importantly is the RAM PER CPU, your choices are 1 GB/core (x16) or 3 GB/core (x8).

    I assume you are working on large genomes for which you have references, like human or mouse? In that case I think you will be doing much more read mapping than de novo, so one would think more cores is better, but 1 GB/core is a bit low for mapping to large genomes, so you may have idle CPUs anyway! So the 24GB RAM would probably be my choice in the end.

    The issue of fast disk subsystem is a crucial one, which usually gets ignored. A good RAID controller or smart use of Linux md software RAID with multiple 7200rpm spindles should be enough on your tight budget. But remember, if your disks are slow, you can't get data into RAM fast, and processes wait on I/O a lot - especially when there are so many cores competing for disk I/O ! More RAM helps here too, for disk cache etc.

    As an aside, does your institute or partner institute have access to a HPC facility where you can get some CPU allocation?

    Leave a comment:


  • dawe
    replied
    Originally posted by DZhang View Post
    Hi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq.

    I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case.

    Thank you,
    Douglas
    More memory... but I would check that the disk I/O is fast and efficient.

    Leave a comment:


  • mnkyboy
    replied
    You might be better off using AWS (the cloud).

    Leave a comment:


  • drio
    replied
    I'd also suggest adding a SDD drive to use as a scratch. Then you need cheap 1T drives
    to store your data (SATA-II 7k2 should be fine).
    Let us know what machine(s) you end up getting.

    Leave a comment:


  • DZhang
    replied
    Thank you all for great suggestions and comments. Ideally I should bulid two machines - one is for de novo and the other for mapping. Due to the limited budget, I will choose somewhere in between. it is a great suggestion to choose a mainboard with upgrade potential!

    Leave a comment:


  • kmcarr
    replied
    You say "main application" but then list three different applications, that have very different requirements. To be fair resequencing and RNA-Seq share a lot of requirements, a primary one being mapping reads to a reference. Mappers do not require a ton memory but can be sped up (in a nearly linear fashion) by adding cpus. As john mentions sequencers are going to be spitting out more reads, but if your pipeline involves mapping those reads to a reference more memory won't do you much good at all, but doubling the # of cpus sure will.

    On the other hand de novo assembly is a memory pig, and most algorithms are not highly threaded, meaning additional cpus will not provide much benefit for this application.

    You really need to define your requirements better. What specific programs do you think you'll be using? What are their resource requirements to perform projects sized similarly to yours?

    Leave a comment:


  • maubp
    replied
    Also look at the max memory the motherboard can hold, because you'll probably want to add more memory later (e.g. using consumables budget or next year's money).

    Leave a comment:


  • john_mu
    replied
    Yes, more memory. Sequencers are only going to spit out more reads.

    Also, more memory usually means programs can potentially run faster.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
    by seqadmin



    Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
    03-21-2023, 01:49 PM
  • seqadmin
    Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
    by seqadmin




    Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
    03-10-2023, 05:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-24-2023, 02:45 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2023, 12:26 PM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-17-2023, 12:32 PM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-15-2023, 12:42 PM
0 responses
22 views
0 likes
Last Post seqadmin  
Working...
X