Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RudyS
    replied
    objective criterion for comparing de novo assemblies

    The main issue is that there is no true objective criterion for comparing de novo assemblies when no close references are available.[/QUOTE]

    Torsten

    For your bacterial genomes the majority of the DNA is coding for proteins (presumably) ... long open reading frames for proteins that "make sense" is a decent biological criterion ... assembly errors will produce stop codons at a relatively high rate ... indels mostly lead to out-of-frame shifts more often than expected ... I have seen reports of people working on programs to incorporate this kind of CDS "spell-check" ... I do it with undergrads ...

    Rudy

    Leave a comment:


  • Stegger
    replied
    Originally posted by RudyS View Post
    Stuff
    Rudy
    Thanks Rudy, I will try and look a bit further into my assembly output!

    Leave a comment:


  • Stegger
    replied
    Originally posted by Torst View Post
    The main issue is that there is no true objective criterion for comparing de novo assemblies when no close references are available.
    Thats true. I have the Genomics program and are really happy with it, but I am also mostly a molecular biologist with an interest into bioinformatics but no expert. So I like the interface and options in provides me in a somewhat familiar interface.

    Leave a comment:


  • Torst
    replied
    Originally posted by Stegger View Post
    did you see any significant improvements in terms of speed in the 3.2 release from March? They stated that it was significantly faster (like 25%).
    We haven't played with it yet. We just got the single-PC licences and are waiting for some new beefed-up desktops to arrive (16 GB RAM + Quad CPU). I'm busy with Velvet/Shrimp pipelines, but the biologists here will put it through its paces.

    The main issue is that there is no true objective criterion for comparing de novo assemblies when no close references are available.

    Leave a comment:


  • RudyS
    replied
    Stegger

    One thing different in the new release is that there is no longer a minimum contig length of 200 bases ... strangely enough I am now getting "contigs" in de novo assembly of 36 bases ... from 36 base solexa reads ! ... this seems to be a glitch ... my problem with CLC is related to connecting contigs that "by eye" have plenty of coverage at overlapping regions but CLC wont connect them ... the penalty adjustments dont seem to do anything significant ... mismatch penalty of 2 gives basically the same result as mismatch penalty of 1 for de novo assembly ... with velvet there is a large difference in the contig size when you reduce the coverage_cutoff ... other problems, like accuracy, are introduced with reduced coverage_cutoff but at least it acts as one would expect ... with CLC, staring at some of the contig ends after blasting them on what for sure is where they come together, and then looking at the read coverage in the unjoined region, it is hard to understand what kept the assemler from joining them into a larger contig ... on the other hand, CLC does give you the graphic that neatly lines up all the reads so you have the opportunity of looking at them to try to understand how it made its decisions ...

    as Torst points out, the many helpful graphic utilities with CLC (presumably the reason it is slow?) make the experience more pleasant ...

    Rudy

    Leave a comment:


  • Stegger
    replied
    Hi Torst,
    did you see any significant improvements in terms of speed in the 3.2 release from March? They stated that it was significantly faster (like 25%).
    Last edited by Stegger; 04-01-2009, 12:18 AM.

    Leave a comment:


  • Torst
    replied
    RudyS,

    The department I work within spent a fair amount of time evaluating it, and recently purchased a few full licences. CLC was generous with temporary licences throughout the process.

    Our main application area is prokaryotic sequencing and transcript analysis using Illumina GA2, so de novo assembly and SNP reporting was important. We also tried it on a mixture of Win32, Win64, Mac OS X and Linux64 machines - ranging from single core 2 GB to 8 way 64 GB RAM machines.

    Traditionally we have used Velvet for assembly, Shrimp/MAQ for SNP analysis, and Artemis and in-house applications and scripts for the rest.

    We found the CLC "de novo" assembler to be very slow compared to Velvet. The results were similar to what Velvet gave (based on some resequencing results). The main issue is that the CLC de novo assembler did (or does still?) not support PAIRED END assembly (unlike Velvet). It appears it does by the way the GUI presents it, but tech support confirmed it doesn't use it to link contigs. It does show you the paired ends mapped to the result though. We didn't use the reference assembler much. The SNP reporting works well once you tell it to do 'gapped alignment' but it did miss some things we found with Shrimp, but that could be parameter setting issues.

    The RAM usage of CLC was quite huge when loading 1 or 2 lanes of Illumina data. It seemed to need more RAM on the Linux versions than Windows. The Linux versions were problematic with earlier versions we tried, but they did fix some issues. As stated earlier, CLC needed much more CPU time - but it was capable of multithreading for some assemblies, but Velvet was still way faster.

    The main benefit of buying CLC is to "empower" the biologists to explore these data sets themselves. The open source available software just isn't ready for use by non-bioinformatics/I.T people.

    --Torst

    Leave a comment:


  • RudyS
    replied
    opinions on CLC Genomics Workbench

    ECO et al.

    The CLC Genomics Workbench has been available for over a year and I notice that many people signed up to test it ... have people continued to use it (paying customers)? any opinions on their de novo assembler? any suggestions on selecting penalties etc?

    RudyS

    Leave a comment:


  • ECO
    replied
    What! SEQanswers is not in your blogroll?!

    Leave a comment:


  • Roald
    replied
    New CLC Genomics Workbench video: Assembling mixed data sets

    Here at CLC bio, we have just produced a small video which shows how you can assemble mixed data sets in our Genomics Workbench 2.0
    The data are from two different NGS platforms, Illumina Genome Analyzer and 454, and contains both paired-ends and single reads.
    Comments are much appreciated.

    You can view the video here.

    Best regards

    Roald Forsberg, CLC bio.

    Leave a comment:


  • Roald
    replied
    CLC Genomics Workbench and more

    Dear all,

    Several people have requested that we wrote an introduction to the CLC Genomics Workbench, so here goes.

    Next generation sequencing technologies are causing some dramatic changes in the high-throughput sequencing landscape and in turn generating a lot of challenges to the field of bioinformatics. The Genomics Workbench was created to address these challenges.
    The objective of the CLC Genomics Workbench is to create an integrated bioinformatics environment which combines the power to handle the magnitude of NGS data with a carefully designed graphical user interface.

    For the first version we have focused on handling the secondary level of NGS bioinformatics, namely de novo assembly and reference assembly. However, we have also included some tertiary analyses like SNP detection and graphical identification of large scale genomic events.
    For a full feature list, have a look here.

    Version 2.0 of the software is out in a few days, and for this release we have focused on bringing our Workbench to a state where it can comfortably handle human genome size data sets. This includes the following improvements:
    • A completely new short read assembler delivering the worlds fastest reference assembly – click here for more info and white paper
    • Improved memory handling
    • Options to mask reference genomes
    • Smoother handling of hybrid data sets (cross-platform, cross-experiment-design)


    Alongside Genomics WB 2.0, we are also releasing a command line program package for de novo and reference assembly which will give users access to these tools in a scripting environment. This package is a separate product which includes the fast assembly algorithms and a number of utilities for handling assembly results.

    Having established a firm basis for secondary analysis we have an ambitious roadmap for including more tertiary analysis tools later this year. These include:
    • Tag and array based transcriptomics
    • Advanced feature queries – feature tracks
    • Chip-seq framework
    • Improved de-novo assembly
    • Improved detection of genome scale events
    • Full support for color space analysis


    Further down the line we are looking at including features like:
    • RNA-seq
    • CNV detection
    • Metagenomics analyses
    • And lots more


    However, although we intend to provide a very comprehensive tool set we know that we can not cover all applications there is. For this reason, we are focusing on providing an open industry-strength platform that users can modify and extend. For this reason we provide a Software Developer Kit which gives access to an extensive and well supported API and a developer community.

    I hope this was of help and please feel free to post any questions or comments to this that you may have.

    Cheers

    Roald

    Leave a comment:


  • mchaisso
    replied
    Other hybrid assemblers.

    Originally posted by Torst View Post
    The recent version of MIRA claims to be able to perform a true hybrid assembly of sequences from Sanger, 454-FLX, and Illumina. We are still assessing it. See http://chevreux.org/projects_mira.html
    If you have some test data for this, let me know. I'm polishing off some code for hybrid assembly of Sanger, 454, Illumina (or anything, really), and wouldn't mind benchmarking against MIRA. Also, Velvet is working exceptionally well, and it can use long reads, so you might try using it. The breakdown is Velvet is WAY faster than EULER. Sometimes I can get better results with EULER, but that may be because I know how to tune it.

    Leave a comment:


  • ECO
    replied
    Just had their training/intro this morning. Looks pretty powerful, I'm downloading the trial now.

    Leave a comment:


  • ScottC
    replied
    If you call them or email them, they'll allow you to trial it with your own data. You have to discuss the project with them, first, though.

    Leave a comment:


  • wesb
    replied
    CLC Bio Workbench Trial

    I got a trial copy of CLC Bio Workbench. They only let you use their trial data sets. You can't upload your own. Its pretty efficient. It mapped 5 million paired-end solexa reads on my Mac book pro in an hour. It doesn't have all the capability I'd like but I wasn't able to try it on my data.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM
  • seqadmin
    The 3D Genome: New Technologies and Emerging Insights
    by seqadmin


    The study of three-dimensional (3D) genomics explores the spatial structure of genomes and their role in processes like gene expression and DNA replication. By employing innovative technologies, researchers can study these arrangements to discover their role in various biological processes. Scientists continue to find new ways in which the organization of DNA is involved in processes like development1 and disease2.

    Basic Organization and Structure
    Understanding...
    01-22-2024, 03:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:52 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:57 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
48 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-12-2024, 03:37 PM
0 responses
422 views
0 likes
Last Post seqadmin  
Working...
X