The manual suggests one day given enough CPUs. Anyone with experience here? What is the speed of your CPUs? How many do you use? How long does it take in your hands?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Unless I have missed something in the new release, each of the corona lite programs runs on a single processor. However like many bioinformatics programs the corona lite programs can be run in 'embarrassingly parallel' mode. I.e., break down your reference sequence by chromosome or other convenient segment and/or the SOLiD file into enough parts to use up your processors.
The matching part of the corona lite pipeline has 6 parts with the 1st and 3rd part being able to be split up. The other 4 parts are solely single processor but also are really just file copies and thus can be fairly fast.
As for overall time it depends, obviously, on the size of your SOLiD data set -- those 14-20 GB files take a while to toss around -- and your reference sequence. The time also go up in a non-linear fashion depending on how many mis-matches you wish to take into consideration.
A big consideration is having enough disk space, both temporary and permanent, to handle the files.
Since I usually work with partially assembled genomes (i.e., lots of contigs) or CDS or EST projects it is quite often the case that I split up the reference into 64 parts and use all 64 CPUs that I have at my disposal. The ultimate speed of the CPUs really doesn't matter that much. Obviously the faster the better. But I would concentrate more on disk speed and physical memory and exactly how many mismatches you want. 1 mismatch is trivial. 3 (the recommend) less so. 6 or more almost impossible on any sizable dataset.
And, yes, I would say 1-2 days of processing given enough CPUs. My recent work on the bee assembly 4 took about 36 hours to go through the matching steps. But I didn't break down the chromosomes nor SOLiD data set and so only used about 1/4 of my CPUs. There are other people on the machine and despite my hoggish nature I did want to play nicely (for once!) SNP calling added time to that process.Last edited by westerman; 01-12-2009, 01:48 PM.
-
It should be possible to match using 1 CPU given enough memory (4 GB). Given my experience I would expect running times of about 3 weeks for a non-paired mapping of a SOLiD data set to the reference bee genome. SNP calling would probably take an extra week. But I may be pessimistic.
In any case it will take time and you better hope that your computer stays up and running during the process. Last week I had two instances of the computer or file server crashing on me. They were rare instances that should not occur but irritating never-the-less.
Comment
-
You could also try a program I have authored caled BFAST: the Blat-like Fast Accurate Search Tool. You can find download instructions at:
Nils Homer
Comment
-
If your time is precious, try ISAS. Native colorspace and as far as we know its the fastest - if I'm wrong and there is a faster solution please enlighten me !
100 million 25mers on one computer in 30 minutes.
3G human reference, 2 mismatches.
Results identical to corona (just 100 times faster) and same format
See the ISAS thread for more info.
Comment
Latest Articles
Collapse
-
by seqadmin
While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...-
Channel: Articles
Today, 07:15 AM -
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:18 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 08:18 AM
|
||
Started by seqadmin, Today, 08:04 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 08:04 AM
|
||
Started by seqadmin, 06-03-2024, 06:55 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
06-03-2024, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
Comment