Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimal high performance computer spec for NGS data analysis


    I am new in NGS data analysis. Just start to set up the facilities. I was wondering if you guys can give the high performance computer spec which is considered optimal for NGS data analysis.


  • #2
    a lot depends on what types of data you are workign with.. de novo assemblies, alignment, bacterial genomes, vertebrate genomes etc..

    For my work with vertebrate genomes, I am most often limited my RAM, so if I were building, I would buy as much RAM as possible.. >100gb would be sweet!

    Some applications run in parallel, and so speed will increase with number of cores, but nothing needs multiple cores to work.


    • #3
      I am working for a Hungarian bioinformatics company, and one of our main profiles is NGS data analysis, so I have some experience related to this topic. We are also putting together a high-performance computer (called GenoMiner) just to save the hassle for the scientific researchers. Check out our website (, we are selling it from the 15th of July.
      I totally agree with peromhc. The CPU doesn't count so much, but multiple cores are OK, they make some tasks really faster, especially in the case of tasks that can be made parallel easily. Reference assembly is a great example of such tasks.
      RAM is what needed most. Our GenoMiner have 96 GB. But the RAM demands also depends on many factors.
      For example, some assembly algorithms use read-indexing, while others (the newer ones) indexing the genome. The former ones use significantly more memory if the number of reads is higher, the latter ones use significantly more memory if the genome size is larger. (But generally, more reads and larger genomes need much more RAM.)
      Also reference assembly is faster and less demanding than de novo. Once I ran a low read number reference assembly on a small (bacterial) genome, and it took half an hour, while the de novo assembly of the same dataset was 16 hours long.
      The quality of the reads and the parameters of the algorithm you use are also highly significant. For example, if you only want to map the perfect alignments, and don't care about reads with read errors, SNPs, indels, than you can get a very fast result even on a lower performance computer.
      Pre-processing algorithms, like error-correctors are usually not as demanding, but again, they can run way too long depending on the dataset, in the case of error correction if the number of reads and/or the error rate are high.
      Post-processing algorithms are the least demanding, like ChIPSeq peak finders. They usually run decently on a usual desktop computer. But if too much to load, for example, if you want to display in a viewer hundreds/thousands/more reads at the same time, you can have lunch (or go to sleep) while it's refreshing...
      The storage capacity is also something to be considered, raw sequencing data can consume up your hard disk drive quickly. It is said that at least 3 Terabytes are compulsory, but even with that, you'd better think about some sort of archiving, like external hard drives, cloud computing, optical drive with tons of discs...
      Best regards!


      • #4
        Some time passed so I'd like to ask the same question - I've to prepare specification of computer for analysis of NGS data.

        (3Gb genome of maize (one or two inbred lines)) 10x coverage.

        Sequencing itself will be done by external company (probably with illumina hiseq 2000).

        We will receive:

        fast-q files
        Mapping the reads on reference genome (B73 maize)
        BAM files for graphical display of the mapped reads
        Coverage information
        List of SNPs
        List of short InDels
        Comparisons between samples

        But it could be different if other company's service will be chosen.

        I'm completely new to sequencing, I only understand that most probably I don't have to process raw data (images) and assemble genome by myself.

        So what computer configuration would be ok?


        • #5
          Originally posted by floem7 View Post

          So what computer configuration would be ok?
          There are other threads scattered around on SeqAnswers that try to answer this question. I will refer you to one:

          Depending on how much computing you expect to do locally the requirements will change (more local computing == beefier spec needed).


          • #6
            We run our analyses on a 334 node Intel cluster (2x Xeon E5@ 2.0 GHz with 32 GB RAM per node)- I usually call 10 nodes at a time, though I work with bacterial takes around 15 minutes to assemble a genome using Ray. We also use a single node with 1TB of RAM and 4 Xeon E5s for bigger jobs- a colleague used this node to assemble 4 lanes of HiSeq worth of RNAseq data denovo in a few hours.


            • #8
              We use a 512 GB server with 32 cores.

              If you are doing de novo assembly, you need to go for large amount of RAM. RAM size depends on what kind of genome you plan to assemble, because number of k-mers scale with genome size and error rate.

              Please email me at [email protected], if you want more explanation and I can guide you to some threads.


              • #9
                Thank you all for responses.
                @samanta - I won't do de-novo assembly.
                I'd want to align reads to reference genome and perform various downstream analyses - search fo SNP, indels etc.

                From that what I've read so far I suppose that minimum 14Gb of RAM are needed. But I thing better computer will allow me make use of (NGS related) software which will be released in next 3-4 years.


                Latest Articles


                • seqadmin
                  Exploring the Dynamics of the Tumor Microenvironment
                  by seqadmin

                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                  07-08-2024, 03:19 PM
                • seqadmin
                  Exploring Human Diversity Through Large-Scale Omics
                  by seqadmin

                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                  06-25-2024, 06:43 AM





                Topics Statistics Last Post
                Started by seqadmin, 07-19-2024, 07:20 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 07-16-2024, 05:49 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 07-15-2024, 06:53 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 07-10-2024, 07:30 AM
                0 responses
                Last Post seqadmin