Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ezy85
    Junior Member
    • Jun 2010
    • 1

    Optimal high performance computer spec for NGS data analysis

    Hi!

    I am new in NGS data analysis. Just start to set up the facilities. I was wondering if you guys can give the high performance computer spec which is considered optimal for NGS data analysis.

    Thanks.
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    #2
    a lot depends on what types of data you are workign with.. de novo assemblies, alignment, bacterial genomes, vertebrate genomes etc..

    For my work with vertebrate genomes, I am most often limited my RAM, so if I were building, I would buy as much RAM as possible.. >100gb would be sweet!

    Some applications run in parallel, and so speed will increase with number of cores, but nothing needs multiple cores to work.

    Comment

    • smice
      Member
      • Jun 2009
      • 21

      #3
      Hello,
      I am working for a Hungarian bioinformatics company, and one of our main profiles is NGS data analysis, so I have some experience related to this topic. We are also putting together a high-performance computer (called GenoMiner) just to save the hassle for the scientific researchers. Check out our website (www.astridresearch.com), we are selling it from the 15th of July.
      I totally agree with peromhc. The CPU doesn't count so much, but multiple cores are OK, they make some tasks really faster, especially in the case of tasks that can be made parallel easily. Reference assembly is a great example of such tasks.
      RAM is what needed most. Our GenoMiner have 96 GB. But the RAM demands also depends on many factors.
      For example, some assembly algorithms use read-indexing, while others (the newer ones) indexing the genome. The former ones use significantly more memory if the number of reads is higher, the latter ones use significantly more memory if the genome size is larger. (But generally, more reads and larger genomes need much more RAM.)
      Also reference assembly is faster and less demanding than de novo. Once I ran a low read number reference assembly on a small (bacterial) genome, and it took half an hour, while the de novo assembly of the same dataset was 16 hours long.
      The quality of the reads and the parameters of the algorithm you use are also highly significant. For example, if you only want to map the perfect alignments, and don't care about reads with read errors, SNPs, indels, than you can get a very fast result even on a lower performance computer.
      Pre-processing algorithms, like error-correctors are usually not as demanding, but again, they can run way too long depending on the dataset, in the case of error correction if the number of reads and/or the error rate are high.
      Post-processing algorithms are the least demanding, like ChIPSeq peak finders. They usually run decently on a usual desktop computer. But if too much to load, for example, if you want to display in a viewer hundreds/thousands/more reads at the same time, you can have lunch (or go to sleep) while it's refreshing...
      The storage capacity is also something to be considered, raw sequencing data can consume up your hard disk drive quickly. It is said that at least 3 Terabytes are compulsory, but even with that, you'd better think about some sort of archiving, like external hard drives, cloud computing, optical drive with tons of discs...
      Best regards!

      Comment

      • floem7
        Member
        • Jan 2013
        • 19

        #4
        Some time passed so I'd like to ask the same question - I've to prepare specification of computer for analysis of NGS data.

        (3Gb genome of maize (one or two inbred lines)) 10x coverage.

        Sequencing itself will be done by external company (probably with illumina hiseq 2000).

        We will receive:

        fast-q files
        Mapping the reads on reference genome (B73 maize)
        BAM files for graphical display of the mapped reads
        Coverage information
        List of SNPs
        List of short InDels
        Comparisons between samples



        But it could be different if other company's service will be chosen.

        I'm completely new to sequencing, I only understand that most probably I don't have to process raw data (images) and assemble genome by myself.

        So what computer configuration would be ok?

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Originally posted by floem7 View Post

          So what computer configuration would be ok?
          There are other threads scattered around on SeqAnswers that try to answer this question. I will refer you to one: http://seqanswers.com/forums/showthread.php?t=25865.

          Depending on how much computing you expect to do locally the requirements will change (more local computing == beefier spec needed).

          Comment

          • bstamps
            Member
            • Oct 2012
            • 40

            #6
            We run our analyses on a 334 node Intel cluster (2x Xeon E5@ 2.0 GHz with 32 GB RAM per node)- I usually call 10 nodes at a time, though I work with bacterial genomes...it takes around 15 minutes to assemble a genome using Ray. We also use a single node with 1TB of RAM and 4 Xeon E5s for bigger jobs- a colleague used this node to assemble 4 lanes of HiSeq worth of RNAseq data denovo in a few hours.

            Comment

            • samanta
              Senior Member
              • Feb 2010
              • 108

              #8
              We use a 512 GB server with 32 cores.

              If you are doing de novo assembly, you need to go for large amount of RAM. RAM size depends on what kind of genome you plan to assemble, because number of k-mers scale with genome size and error rate.



              Please email me at [email protected], if you want more explanation and I can guide you to some threads.
              http://homolog.us

              Comment

              • floem7
                Member
                • Jan 2013
                • 19

                #9
                Thank you all for responses.
                @samanta - I won't do de-novo assembly.
                I'd want to align reads to reference genome and perform various downstream analyses - search fo SNP, indels etc.

                From that what I've read so far I suppose that minimum 14Gb of RAM are needed. But I thing better computer will allow me make use of (NGS related) software which will be released in next 3-4 years.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  03-24-2025, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                41 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                51 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                38 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                193 views
                0 reactions
                Last Post seqadmin  
                Working...