Header Leaderboard Ad

Collapse

CLC Genomics Workbench slow in de novo assembly

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CLC Genomics Workbench slow in de novo assembly

    Hi!

    I have paired-end Illumina genomic data in 4 libraries with insert sizes 180, 500, 800 and 2kbp. All the libraries are from one sample and they have been trimmed and quality filtered by the sequencing company and they are very high quality.

    However, we got the CLC Genomics Workbench 7 to our computer and we're trying to assemble these libraries together into contigs with no reference sequence. Parameters other than defaults:

    Wordsize: 64
    Bubble size: 133

    Mapping back to contigs

    Perform scaffolding


    However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library.

    Thank you for all the help!

  • #2
    You probably should contact CLC Tech Support for help with this question (http://www.clcbio.com/support/contact/).

    Comment


    • #3
      [QUOTE= However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library. Thank you for all the help![/QUOTE]

      It is advisable to contact CLC as suggested by @Genomax. In my experience, the De Novo assembly using CLC Genomic workbench takes ~2 h for 3 GB data (let a total of 5-7 different samples). I suspect something is going on wrong, try to let the software detects automatically the bubble and word size and see if it can be different.

      Comment


      • #4
        Thank you for the advice!

        I noticed that I had made a simple mistake of importing the libraries with R1 and R2 separately because the sequencing company did not inform us what the minimum and maximum distances for the paired ends are. So, could it be just that the assembly is stuck when the unpaired reads from all the libraries are being mixed together?

        We also contacted CLC and they informed us to do the mapping back to contig separately, use all libraries for the assembly and increase the word size.

        Comment


        • #5
          To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.

          Comment


          • #6
            Spades is very good as well - in my assemblies (BAC pools) sometimes CLC performed better sometimes Spades. In most cases I got the best results assembling reads that were error corrected by SPAdes in CLC.

            CLC is the least demanding (with regards to the input data) assembler I have encountered so far; it almost always produces a reasonable assembly no matter which types of data are available. In one of our projects Allpaths completely refused to assemble certain parts of a (heterozygous) genome - CLC did (with the libraries being tailored for Allpaths LG).
            In my limited experience there are always many different factors at play which influence the assembly metrics - among them hitting the right amount of input sequence data. CLC is comparatively tolerant in this regard as well.

            Btw, I always use the maximum word-size now in CLC.

            Originally posted by lucio89 View Post
            To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.
            Last edited by luc; 10-02-2014, 09:43 PM.

            Comment


            • #7
              CLC may be fast but stats that I have gotten back even N50 which i rarely rely on are better! (I dont rely on N50 because it can be negated when proper error correction isnt employed!). It depends on the genome you are assembling and also the computational power you have (server or computer) but i would always go for something that was developed by someone that is trying to work out the problem rather than a company that is trying to make money!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
                by seqadmin



                Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
                03-21-2023, 01:49 PM
              • seqadmin
                Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
                by seqadmin




                Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
                03-10-2023, 05:31 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 12:26 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-17-2023, 12:32 PM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-15-2023, 12:42 PM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-09-2023, 10:17 AM
              0 responses
              68 views
              1 like
              Last Post seqadmin  
              Working...
              X