Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • floem7
    replied
    Thank you all for responses.
    @samanta - I won't do de-novo assembly.
    I'd want to align reads to reference genome and perform various downstream analyses - search fo SNP, indels etc.

    From that what I've read so far I suppose that minimum 14Gb of RAM are needed. But I thing better computer will allow me make use of (NGS related) software which will be released in next 3-4 years.

    Leave a comment:


  • samanta
    replied
    We use a 512 GB server with 32 cores.

    If you are doing de novo assembly, you need to go for large amount of RAM. RAM size depends on what kind of genome you plan to assemble, because number of k-mers scale with genome size and error rate.



    Please email me at [email protected], if you want more explanation and I can guide you to some threads.

    Leave a comment:


  • samanta
    replied

    Leave a comment:


  • bstamps
    replied
    We run our analyses on a 334 node Intel cluster (2x Xeon E5@ 2.0 GHz with 32 GB RAM per node)- I usually call 10 nodes at a time, though I work with bacterial genomes...it takes around 15 minutes to assemble a genome using Ray. We also use a single node with 1TB of RAM and 4 Xeon E5s for bigger jobs- a colleague used this node to assemble 4 lanes of HiSeq worth of RNAseq data denovo in a few hours.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by floem7 View Post

    So what computer configuration would be ok?
    There are other threads scattered around on SeqAnswers that try to answer this question. I will refer you to one: http://seqanswers.com/forums/showthread.php?t=25865.

    Depending on how much computing you expect to do locally the requirements will change (more local computing == beefier spec needed).

    Leave a comment:


  • floem7
    replied
    Some time passed so I'd like to ask the same question - I've to prepare specification of computer for analysis of NGS data.

    (3Gb genome of maize (one or two inbred lines)) 10x coverage.

    Sequencing itself will be done by external company (probably with illumina hiseq 2000).

    We will receive:

    fast-q files
    Mapping the reads on reference genome (B73 maize)
    BAM files for graphical display of the mapped reads
    Coverage information
    List of SNPs
    List of short InDels
    Comparisons between samples



    But it could be different if other company's service will be chosen.

    I'm completely new to sequencing, I only understand that most probably I don't have to process raw data (images) and assemble genome by myself.

    So what computer configuration would be ok?

    Leave a comment:


  • smice
    replied
    Hello,
    I am working for a Hungarian bioinformatics company, and one of our main profiles is NGS data analysis, so I have some experience related to this topic. We are also putting together a high-performance computer (called GenoMiner) just to save the hassle for the scientific researchers. Check out our website (www.astridresearch.com), we are selling it from the 15th of July.
    I totally agree with peromhc. The CPU doesn't count so much, but multiple cores are OK, they make some tasks really faster, especially in the case of tasks that can be made parallel easily. Reference assembly is a great example of such tasks.
    RAM is what needed most. Our GenoMiner have 96 GB. But the RAM demands also depends on many factors.
    For example, some assembly algorithms use read-indexing, while others (the newer ones) indexing the genome. The former ones use significantly more memory if the number of reads is higher, the latter ones use significantly more memory if the genome size is larger. (But generally, more reads and larger genomes need much more RAM.)
    Also reference assembly is faster and less demanding than de novo. Once I ran a low read number reference assembly on a small (bacterial) genome, and it took half an hour, while the de novo assembly of the same dataset was 16 hours long.
    The quality of the reads and the parameters of the algorithm you use are also highly significant. For example, if you only want to map the perfect alignments, and don't care about reads with read errors, SNPs, indels, than you can get a very fast result even on a lower performance computer.
    Pre-processing algorithms, like error-correctors are usually not as demanding, but again, they can run way too long depending on the dataset, in the case of error correction if the number of reads and/or the error rate are high.
    Post-processing algorithms are the least demanding, like ChIPSeq peak finders. They usually run decently on a usual desktop computer. But if too much to load, for example, if you want to display in a viewer hundreds/thousands/more reads at the same time, you can have lunch (or go to sleep) while it's refreshing...
    The storage capacity is also something to be considered, raw sequencing data can consume up your hard disk drive quickly. It is said that at least 3 Terabytes are compulsory, but even with that, you'd better think about some sort of archiving, like external hard drives, cloud computing, optical drive with tons of discs...
    Best regards!

    Leave a comment:


  • peromhc
    replied
    a lot depends on what types of data you are workign with.. de novo assemblies, alignment, bacterial genomes, vertebrate genomes etc..

    For my work with vertebrate genomes, I am most often limited my RAM, so if I were building, I would buy as much RAM as possible.. >100gb would be sweet!

    Some applications run in parallel, and so speed will increase with number of cores, but nothing needs multiple cores to work.

    Leave a comment:


  • Optimal high performance computer spec for NGS data analysis

    Hi!

    I am new in NGS data analysis. Just start to set up the facilities. I was wondering if you guys can give the high performance computer spec which is considered optimal for NGS data analysis.

    Thanks.

Latest Articles

Collapse

  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 04:51 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
16 views
0 likes
Last Post seqadmin  
Working...
X