Header Leaderboard Ad
Collapse
CLC Genomics Workbench slow in de novo assembly
Collapse
Announcement
Collapse
No announcement yet.
X
-
CLC may be fast but stats that I have gotten back even N50 which i rarely rely on are better! (I dont rely on N50 because it can be negated when proper error correction isnt employed!). It depends on the genome you are assembling and also the computational power you have (server or computer) but i would always go for something that was developed by someone that is trying to work out the problem rather than a company that is trying to make money!
-
Spades is very good as well - in my assemblies (BAC pools) sometimes CLC performed better sometimes Spades. In most cases I got the best results assembling reads that were error corrected by SPAdes in CLC.
CLC is the least demanding (with regards to the input data) assembler I have encountered so far; it almost always produces a reasonable assembly no matter which types of data are available. In one of our projects Allpaths completely refused to assemble certain parts of a (heterozygous) genome - CLC did (with the libraries being tailored for Allpaths LG).
In my limited experience there are always many different factors at play which influence the assembly metrics - among them hitting the right amount of input sequence data. CLC is comparatively tolerant in this regard as well.
Btw, I always use the maximum word-size now in CLC.
Originally posted by lucio89 View PostTo be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.Last edited by luc; 10-02-2014, 09:43 PM.
Leave a comment:
-
To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.
Leave a comment:
-
Thank you for the advice!
I noticed that I had made a simple mistake of importing the libraries with R1 and R2 separately because the sequencing company did not inform us what the minimum and maximum distances for the paired ends are. So, could it be just that the assembly is stuck when the unpaired reads from all the libraries are being mixed together?
We also contacted CLC and they informed us to do the mapping back to contig separately, use all libraries for the assembly and increase the word size.
Leave a comment:
-
[QUOTE= However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library. Thank you for all the help![/QUOTE]
It is advisable to contact CLC as suggested by @Genomax. In my experience, the De Novo assembly using CLC Genomic workbench takes ~2 h for 3 GB data (let a total of 5-7 different samples). I suspect something is going on wrong, try to let the software detects automatically the bubble and word size and see if it can be different.
Leave a comment:
-
You probably should contact CLC Tech Support for help with this question (http://www.clcbio.com/support/contact/).
Leave a comment:
-
CLC Genomics Workbench slow in de novo assembly
Hi!
I have paired-end Illumina genomic data in 4 libraries with insert sizes 180, 500, 800 and 2kbp. All the libraries are from one sample and they have been trimmed and quality filtered by the sequencing company and they are very high quality.
However, we got the CLC Genomics Workbench 7 to our computer and we're trying to assemble these libraries together into contigs with no reference sequence. Parameters other than defaults:
Wordsize: 64
Bubble size: 133
Mapping back to contigs
Perform scaffolding
However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library.
Thank you for all the help!Tags: None
Latest Articles
Collapse
-
by seqadmin
Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...-
Channel: Articles
03-21-2023, 01:49 PM -
-
by seqadmin
Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...-
Channel: Articles
03-10-2023, 05:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 12:26 PM
|
0 responses
7 views
0 likes
|
Last Post
by seqadmin
Yesterday, 12:26 PM
|
||
Started by seqadmin, 03-17-2023, 12:32 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
03-17-2023, 12:32 PM
|
||
Started by seqadmin, 03-15-2023, 12:42 PM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
03-15-2023, 12:42 PM
|
||
Started by seqadmin, 03-09-2023, 10:17 AM
|
0 responses
68 views
1 like
|
Last Post
by seqadmin
03-09-2023, 10:17 AM
|
Leave a comment: