Storage perfomance notes.
PS:
SSD drives HATE random writing in small blocks (they want to write in 64-128-256) KB blocks, but are OK random reads.
Harddisks struggle with random reads/writes, and also with simultanious reading/writing in multiple threads.
Raid5 is TERRIBLE for random writing. (4-5 times slower than RAID10).
My suggestion for the high perfomance system:
(each group is on separate physical HDD or raid array)
[System+Software]
[SWAP]
[SCRATCH]
[Input data]
[output data]
or at least:
[System+Software]
[SWAP+SCRATCH]
[all data]
Remember, that having input/output data on the same RAID0 array is always slower, than having them on separate disks w/o raid
(in memory constrained situation).
If working with tons of small files (phd_dir) order them by the inode number, and then read them sequentially - will be a lot faster (10-20X) than random IO on the same HDD.
Use symlinks to faciliate data processing/organisation.
If you want to use RAID - use RAID10 on the GOOD controller (Adaptec) with at least .25-1GB of the onbord cache and it's own I/O CPU. The perfomance gains with cheap onboard controllers (w/o cache) are often negative... so use separate disks, if can't afford proper RAID.
Header Leaderboard Ad
Collapse
Computer Hardware: CPU vs. Memory
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by KevinLam View PostAGREED.
I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
Just remember that even the least efficient software is designed to work on someone's machine, so there's an upper limit on how much RAM you'll ever need to be able to run a program. That limit might be on the order of tens of gigabytes (I've heard 40-50 for certain well-known pipelines). But I don't think there's a reason to complement that with SSDs, because they're definitely not going to buy you any additional speed as virtual memory.
Leave a comment:
-
Originally posted by KevinLam View PostWell SSDs vary in speeds as well and while you have a point about SATA HDD RAID.
You can easily have SATA SSD RAID.
4 x SSD would have ~ 1.2 GB/s by your numbers
The other issue is that 4xSSD RAID0 = 1.2 GB/s = 9.6 Gbit/sec. Even SATA3 is only 6.0 Gbit/sec, so you have to start investing in more expensive interconnects like 10GigE, multiple FC, etc. And have a PCIe bus and CPU<->BUS connection that can cope too!
My point is that it is still long way away from RAM throughput and latency (SSD = micro/milli seconds, RAM = nanoseconds).
Leave a comment:
-
Originally posted by Torst View PostAn SSD is only marginally faster than a HDD when compared to RAM. A good RAID array of HDDs still beats a single SSD too (for throughput, not latency though).
HDD ~ 75 MB/s
SSD ~ 300 MB/s
RAM ~ 10000 MB/s (!)
Just use your SSD as your virtual memory / swap disk?
Some software is now being intelligently written to exploit RAM/HDD tradeoff, for example this read mapper: Syzygy
You can easily have SATA SSD RAID.
4 x SSD would have ~ 1200 MB/s by your numbers
only 8.33x slower than RAM!
btw your url is not formatted properly went to some weird site
http://www.nicta.com.au/research/res...Q0MDAzJmFsbD0xLast edited by KevinLam; 08-27-2010, 12:27 AM.
Leave a comment:
-
Originally posted by KevinLam View PostAGREED.
I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
HDD ~ 75 MB/s
SSD ~ 300 MB/s
RAM ~ 10000 MB/s (!)
but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
Some software is now being intelligently written to exploit RAM/HDD tradeoff, for example this read mapper: Syzygy
Leave a comment:
-
Originally posted by jwfoley View PostDefinitely more memory. A lot of people are writing terrible code that wastes tons of memory, and it's better to run programs slowly than not be able to run them at all.
I am curious though if I were to use a SSD as a swap would I be in a sweet zone for $$ vs speed?
but I guess it's a moot question since for some reason I can't find programs that allow you to choose to write to disk or use RAM.
Leave a comment:
-
Look at the motherboard, because what you want is expandability. Boards for the AMD 6100 typically come with 1, 2 or 4 processor sockets and have 8, 16 or 32 memory slots respectively. Processor slots need to be populated with identical cpus (not all need to be filled), and memory slots should be populated in groups of 4 identical sticks.
Your to 24GB is likely a 1P board with 4x4GB + 4x2GB, and thus would fill all of your cpu and memory slots (no expandability without throwing away components).
The 16GB configuration would likely be a 2p board with 8x2GB or 4x4GB and thus would leave 8 or 12 open memory slots (room to grow).
Leave a comment:
-
Originally posted by DZhang View PostHi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq. I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case.
Frankly, the difference between 16GB and 24GB RAM is not that much, and won't help with de novo too much. More importantly is the RAM PER CPU, your choices are 1 GB/core (x16) or 3 GB/core (x8).
I assume you are working on large genomes for which you have references, like human or mouse? In that case I think you will be doing much more read mapping than de novo, so one would think more cores is better, but 1 GB/core is a bit low for mapping to large genomes, so you may have idle CPUs anyway! So the 24GB RAM would probably be my choice in the end.
The issue of fast disk subsystem is a crucial one, which usually gets ignored. A good RAID controller or smart use of Linux md software RAID with multiple 7200rpm spindles should be enough on your tight budget. But remember, if your disks are slow, you can't get data into RAM fast, and processes wait on I/O a lot - especially when there are so many cores competing for disk I/O ! More RAM helps here too, for disk cache etc.
As an aside, does your institute or partner institute have access to a HPC facility where you can get some CPU allocation?
Leave a comment:
-
Originally posted by DZhang View PostHi, I am planning to build a computer for next-gen analysis with a tight budget. The main application is de novo assembly, re-sequencing, and RNA-seq.
I can choose either two AMD 8-core CPUs (16 cores total) with 16G memory or one AMD 8-core CPU with 24G memory. My question is whether I should invest in # of cores or memory capacity in this case.
Thank you,
Douglas
Leave a comment:
-
I'd also suggest adding a SDD drive to use as a scratch. Then you need cheap 1T drives
to store your data (SATA-II 7k2 should be fine).
Let us know what machine(s) you end up getting.
Leave a comment:
-
Thank you all for great suggestions and comments. Ideally I should bulid two machines - one is for de novo and the other for mapping. Due to the limited budget, I will choose somewhere in between. it is a great suggestion to choose a mainboard with upgrade potential!
Leave a comment:
-
You say "main application" but then list three different applications, that have very different requirements. To be fair resequencing and RNA-Seq share a lot of requirements, a primary one being mapping reads to a reference. Mappers do not require a ton memory but can be sped up (in a nearly linear fashion) by adding cpus. As john mentions sequencers are going to be spitting out more reads, but if your pipeline involves mapping those reads to a reference more memory won't do you much good at all, but doubling the # of cpus sure will.
On the other hand de novo assembly is a memory pig, and most algorithms are not highly threaded, meaning additional cpus will not provide much benefit for this application.
You really need to define your requirements better. What specific programs do you think you'll be using? What are their resource requirements to perform projects sized similarly to yours?
Leave a comment:
-
Also look at the max memory the motherboard can hold, because you'll probably want to add more memory later (e.g. using consumables budget or next year's money).
Leave a comment:
-
Yes, more memory. Sequencers are only going to spit out more reads.
Also, more memory usually means programs can potentially run faster.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...-
Channel: Articles
03-21-2023, 01:49 PM -
-
by seqadmin
Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...-
Channel: Articles
03-10-2023, 05:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-24-2023, 02:45 PM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
03-24-2023, 02:45 PM
|
||
Started by seqadmin, 03-22-2023, 12:26 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
03-22-2023, 12:26 PM
|
||
Started by seqadmin, 03-17-2023, 12:32 PM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
03-17-2023, 12:32 PM
|
||
Started by seqadmin, 03-15-2023, 12:42 PM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
03-15-2023, 12:42 PM
|
Leave a comment: