Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • floem7
    replied
    Thank you all for responses.
    @samanta - I won't do de-novo assembly.
    I'd want to align reads to reference genome and perform various downstream analyses - search fo SNP, indels etc.

    From that what I've read so far I suppose that minimum 14Gb of RAM are needed. But I thing better computer will allow me make use of (NGS related) software which will be released in next 3-4 years.

    Leave a comment:


  • samanta
    replied
    We use a 512 GB server with 32 cores.

    If you are doing de novo assembly, you need to go for large amount of RAM. RAM size depends on what kind of genome you plan to assemble, because number of k-mers scale with genome size and error rate.



    Please email me at [email protected], if you want more explanation and I can guide you to some threads.

    Leave a comment:


  • samanta
    replied

    Leave a comment:


  • bstamps
    replied
    We run our analyses on a 334 node Intel cluster (2x Xeon E5@ 2.0 GHz with 32 GB RAM per node)- I usually call 10 nodes at a time, though I work with bacterial genomes...it takes around 15 minutes to assemble a genome using Ray. We also use a single node with 1TB of RAM and 4 Xeon E5s for bigger jobs- a colleague used this node to assemble 4 lanes of HiSeq worth of RNAseq data denovo in a few hours.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by floem7 View Post

    So what computer configuration would be ok?
    There are other threads scattered around on SeqAnswers that try to answer this question. I will refer you to one: http://seqanswers.com/forums/showthread.php?t=25865.

    Depending on how much computing you expect to do locally the requirements will change (more local computing == beefier spec needed).

    Leave a comment:


  • floem7
    replied
    Some time passed so I'd like to ask the same question - I've to prepare specification of computer for analysis of NGS data.

    (3Gb genome of maize (one or two inbred lines)) 10x coverage.

    Sequencing itself will be done by external company (probably with illumina hiseq 2000).

    We will receive:

    fast-q files
    Mapping the reads on reference genome (B73 maize)
    BAM files for graphical display of the mapped reads
    Coverage information
    List of SNPs
    List of short InDels
    Comparisons between samples



    But it could be different if other company's service will be chosen.

    I'm completely new to sequencing, I only understand that most probably I don't have to process raw data (images) and assemble genome by myself.

    So what computer configuration would be ok?

    Leave a comment:


  • smice
    replied
    Hello,
    I am working for a Hungarian bioinformatics company, and one of our main profiles is NGS data analysis, so I have some experience related to this topic. We are also putting together a high-performance computer (called GenoMiner) just to save the hassle for the scientific researchers. Check out our website (www.astridresearch.com), we are selling it from the 15th of July.
    I totally agree with peromhc. The CPU doesn't count so much, but multiple cores are OK, they make some tasks really faster, especially in the case of tasks that can be made parallel easily. Reference assembly is a great example of such tasks.
    RAM is what needed most. Our GenoMiner have 96 GB. But the RAM demands also depends on many factors.
    For example, some assembly algorithms use read-indexing, while others (the newer ones) indexing the genome. The former ones use significantly more memory if the number of reads is higher, the latter ones use significantly more memory if the genome size is larger. (But generally, more reads and larger genomes need much more RAM.)
    Also reference assembly is faster and less demanding than de novo. Once I ran a low read number reference assembly on a small (bacterial) genome, and it took half an hour, while the de novo assembly of the same dataset was 16 hours long.
    The quality of the reads and the parameters of the algorithm you use are also highly significant. For example, if you only want to map the perfect alignments, and don't care about reads with read errors, SNPs, indels, than you can get a very fast result even on a lower performance computer.
    Pre-processing algorithms, like error-correctors are usually not as demanding, but again, they can run way too long depending on the dataset, in the case of error correction if the number of reads and/or the error rate are high.
    Post-processing algorithms are the least demanding, like ChIPSeq peak finders. They usually run decently on a usual desktop computer. But if too much to load, for example, if you want to display in a viewer hundreds/thousands/more reads at the same time, you can have lunch (or go to sleep) while it's refreshing...
    The storage capacity is also something to be considered, raw sequencing data can consume up your hard disk drive quickly. It is said that at least 3 Terabytes are compulsory, but even with that, you'd better think about some sort of archiving, like external hard drives, cloud computing, optical drive with tons of discs...
    Best regards!

    Leave a comment:


  • peromhc
    replied
    a lot depends on what types of data you are workign with.. de novo assemblies, alignment, bacterial genomes, vertebrate genomes etc..

    For my work with vertebrate genomes, I am most often limited my RAM, so if I were building, I would buy as much RAM as possible.. >100gb would be sweet!

    Some applications run in parallel, and so speed will increase with number of cores, but nothing needs multiple cores to work.

    Leave a comment:


  • Optimal high performance computer spec for NGS data analysis

    Hi!

    I am new in NGS data analysis. Just start to set up the facilities. I was wondering if you guys can give the high performance computer spec which is considered optimal for NGS data analysis.

    Thanks.

Latest Articles

Collapse

  • seqadmin
    Pathogen Surveillance with Advanced Genomic Tools
    by seqadmin




    The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
    03-24-2025, 11:48 AM
  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-20-2025, 05:03 AM
0 responses
41 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-19-2025, 07:27 AM
0 responses
51 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-18-2025, 12:50 PM
0 responses
38 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
193 views
0 reactions
Last Post seqadmin  
Working...