Thank you all for responses.
@samanta - I won't do de-novo assembly.
I'd want to align reads to reference genome and perform various downstream analyses - search fo SNP, indels etc.
From that what I've read so far I suppose that minimum 14Gb of RAM are needed. But I thing better computer will allow me make use of (NGS related) software which will be released in next 3-4 years.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
We use a 512 GB server with 32 cores.
If you are doing de novo assembly, you need to go for large amount of RAM. RAM size depends on what kind of genome you plan to assemble, because number of k-mers scale with genome size and error rate.
Please email me at [email protected], if you want more explanation and I can guide you to some threads.
Leave a comment:
-
We run our analyses on a 334 node Intel cluster (2x Xeon E5@ 2.0 GHz with 32 GB RAM per node)- I usually call 10 nodes at a time, though I work with bacterial genomes...it takes around 15 minutes to assemble a genome using Ray. We also use a single node with 1TB of RAM and 4 Xeon E5s for bigger jobs- a colleague used this node to assemble 4 lanes of HiSeq worth of RNAseq data denovo in a few hours.
Leave a comment:
-
Originally posted by floem7 View Post
So what computer configuration would be ok?
Depending on how much computing you expect to do locally the requirements will change (more local computing == beefier spec needed).
Leave a comment:
-
Some time passed so I'd like to ask the same question - I've to prepare specification of computer for analysis of NGS data.
(3Gb genome of maize (one or two inbred lines)) 10x coverage.
Sequencing itself will be done by external company (probably with illumina hiseq 2000).
We will receive:
fast-q files
Mapping the reads on reference genome (B73 maize)
BAM files for graphical display of the mapped reads
Coverage information
List of SNPs
List of short InDels
Comparisons between samples
But it could be different if other company's service will be chosen.
I'm completely new to sequencing, I only understand that most probably I don't have to process raw data (images) and assemble genome by myself.
So what computer configuration would be ok?
Leave a comment:
-
Hello,
I am working for a Hungarian bioinformatics company, and one of our main profiles is NGS data analysis, so I have some experience related to this topic. We are also putting together a high-performance computer (called GenoMiner) just to save the hassle for the scientific researchers. Check out our website (www.astridresearch.com), we are selling it from the 15th of July.
I totally agree with peromhc. The CPU doesn't count so much, but multiple cores are OK, they make some tasks really faster, especially in the case of tasks that can be made parallel easily. Reference assembly is a great example of such tasks.
RAM is what needed most. Our GenoMiner have 96 GB. But the RAM demands also depends on many factors.
For example, some assembly algorithms use read-indexing, while others (the newer ones) indexing the genome. The former ones use significantly more memory if the number of reads is higher, the latter ones use significantly more memory if the genome size is larger. (But generally, more reads and larger genomes need much more RAM.)
Also reference assembly is faster and less demanding than de novo. Once I ran a low read number reference assembly on a small (bacterial) genome, and it took half an hour, while the de novo assembly of the same dataset was 16 hours long.
The quality of the reads and the parameters of the algorithm you use are also highly significant. For example, if you only want to map the perfect alignments, and don't care about reads with read errors, SNPs, indels, than you can get a very fast result even on a lower performance computer.
Pre-processing algorithms, like error-correctors are usually not as demanding, but again, they can run way too long depending on the dataset, in the case of error correction if the number of reads and/or the error rate are high.
Post-processing algorithms are the least demanding, like ChIPSeq peak finders. They usually run decently on a usual desktop computer. But if too much to load, for example, if you want to display in a viewer hundreds/thousands/more reads at the same time, you can have lunch (or go to sleep) while it's refreshing...
The storage capacity is also something to be considered, raw sequencing data can consume up your hard disk drive quickly. It is said that at least 3 Terabytes are compulsory, but even with that, you'd better think about some sort of archiving, like external hard drives, cloud computing, optical drive with tons of discs...
Best regards!
Leave a comment:
-
a lot depends on what types of data you are workign with.. de novo assemblies, alignment, bacterial genomes, vertebrate genomes etc..
For my work with vertebrate genomes, I am most often limited my RAM, so if I were building, I would buy as much RAM as possible.. >100gb would be sweet!
Some applications run in parallel, and so speed will increase with number of cores, but nothing needs multiple cores to work.
Leave a comment:
-
Optimal high performance computer spec for NGS data analysis
Hi!
I am new in NGS data analysis. Just start to set up the facilities. I was wondering if you guys can give the high performance computer spec which is considered optimal for NGS data analysis.
Thanks.Tags: None
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 04:51 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Yesterday, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: