Header Leaderboard Ad

Collapse

CLC Genomics Workbench slow in de novo assembly

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CLC Genomics Workbench slow in de novo assembly

    Hi!

    I have paired-end Illumina genomic data in 4 libraries with insert sizes 180, 500, 800 and 2kbp. All the libraries are from one sample and they have been trimmed and quality filtered by the sequencing company and they are very high quality.

    However, we got the CLC Genomics Workbench 7 to our computer and we're trying to assemble these libraries together into contigs with no reference sequence. Parameters other than defaults:

    Wordsize: 64
    Bubble size: 133

    Mapping back to contigs

    Perform scaffolding


    However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library.

    Thank you for all the help!

  • #2
    You probably should contact CLC Tech Support for help with this question (http://www.clcbio.com/support/contact/).

    Comment


    • #3
      [QUOTE= However the assembly halts for days into the mapping-phase. Is this normal? The mapping back to contigs should be slow, but how slow should it be? The data is over 10 GB per library. Thank you for all the help![/QUOTE]

      It is advisable to contact CLC as suggested by @Genomax. In my experience, the De Novo assembly using CLC Genomic workbench takes ~2 h for 3 GB data (let a total of 5-7 different samples). I suspect something is going on wrong, try to let the software detects automatically the bubble and word size and see if it can be different.

      Comment


      • #4
        Thank you for the advice!

        I noticed that I had made a simple mistake of importing the libraries with R1 and R2 separately because the sequencing company did not inform us what the minimum and maximum distances for the paired ends are. So, could it be just that the assembly is stuck when the unpaired reads from all the libraries are being mixed together?

        We also contacted CLC and they informed us to do the mapping back to contig separately, use all libraries for the assembly and increase the word size.

        Comment


        • #5
          To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.

          Comment


          • #6
            Spades is very good as well - in my assemblies (BAC pools) sometimes CLC performed better sometimes Spades. In most cases I got the best results assembling reads that were error corrected by SPAdes in CLC.

            CLC is the least demanding (with regards to the input data) assembler I have encountered so far; it almost always produces a reasonable assembly no matter which types of data are available. In one of our projects Allpaths completely refused to assemble certain parts of a (heterozygous) genome - CLC did (with the libraries being tailored for Allpaths LG).
            In my limited experience there are always many different factors at play which influence the assembly metrics - among them hitting the right amount of input sequence data. CLC is comparatively tolerant in this regard as well.

            Btw, I always use the maximum word-size now in CLC.

            Originally posted by lucio89 View Post
            To be honest I have used CLC aswell and found that Spades gives a much better assembly comparatively! You should try the platforms that performed well in GAGE, just because it is a licensed software doesn't mean its the best.
            Last edited by luc; 10-02-2014, 09:43 PM.

            Comment


            • #7
              CLC may be fast but stats that I have gotten back even N50 which i rarely rely on are better! (I dont rely on N50 because it can be negated when proper error correction isnt employed!). It depends on the genome you are assembling and also the computational power you have (server or computer) but i would always go for something that was developed by someone that is trying to work out the problem rather than a company that is trying to make money!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                by seqadmin


                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                01-24-2023, 01:19 PM
              • seqadmin
                Introduction to Single-Cell Sequencing
                by seqadmin
                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                ...
                01-09-2023, 03:10 PM

              ad_right_rmr

              Collapse
              Working...
              X