CLC Genomics Workbench

The_Roads replied

09-09-2009, 08:26 PM
As an update for future readers, received some excellent help from CLC and it appears the delays we experienced were due to the way we assembled our contigs. Using conventional assembly parameters graphics now render in seconds to minutes (CLC3.6.5).
Leave a comment:
polsum replied

07-31-2009, 08:33 AM
Hi, I have been testing the trial version of CLC workbench and I encountered two issues.

1. I BLASTed a set of illumina generated short reads (17-33, after trimming the 3'adapters) to mouse RefSeq database through stand alone BLAST program with stringent parameters and I found, say 2500 reads matching to mouse mRNAs. When I aligned all those 2500 reads to the same RefSeq database by using CLC reference assembly, only half of them are aligning to the reference. I tried all different options available changing the gap penalties, global alignment, scores etc...but never all the reads aligned to the reference. I think there should be more options here.

2. I used BLAST feature in CLC bench and when I view the blast output parsed results, I dont see all the columns in the overview table. For example I dont see strand orientation titled column in the overview. However, I see it in the individual blast mapping, but it is useless for me because I need to count the total number of minus and plus mappings of the total number of mappings. This is a serious limitation for me.
Leave a comment:
The_Roads replied

07-28-2009, 03:29 PM
Hi Shawn,

Thanks I'll be in touch.

We have the 64 bit version and we've already tweaked vmoptions. Everything to relating to assembling, SNP detection etc. that requires 64 bit computing is working fine. It is just anything that alters the GUI or exports graphics/text that locks the workstations on large assemblies.

The_Roads
Leave a comment:
smprince18 replied

07-28-2009, 02:20 PM
disclaimer I work at CLC bio

The_Roads

I was wondering if you have tried to contact [email protected] yet? Or you can reach us at (617)-444-8765. It could be a result that your VMoptions where not adjust correctly by the installer. If you go to the directory for the CLCGenomicsWB3 and show hidden, then you will see the .vmoption file, open this in notepad, there will be a line that looks like the following -Xmx####, where # = the number of mb allocated to the application (an example Xmx1024, means you have 1 gb of RAM allocated to the application)

The second possibility you downloaded the 32 bit version of the application and not the 64 bit, this would result in very slow response time since, a 32 bit application can only request 2gb of ram)

Again if you would like some help with this contact our support team or myself directly.

Shawn M Prince
Leave a comment:
The_Roads replied

07-09-2009, 07:01 AM
No sorry I meant turning on alignment info like coverage maps, non-specific reads etc. it takes a very long time for CLC to generate the graph in the top frame. likewise once the graph is there it takes ages to export a csv file of the graph. i'd like to know if anyone else has this problem or whether it might be something funky with my workstation (win6x 1x quad xeon 32Gb)

Thanks
Leave a comment:
arne.muller replied

07-08-2009, 11:57 PM
Hello,

you mean exporting the current view into a graphics (e.g. png) file? I've had some relatively long response times during the export, but not as long as you report. I can imagine that the time needed for export is proportional to the number of elements in the current view. Maybe just take a screen shot for instead (not nice but often that's enough)?

Arne
Leave a comment:
The_Roads replied

07-08-2009, 01:36 PM
Hi,

Anyone else having problems viewing graphical output from CLCGWB?

We're working with high coverage assemblies (5-20K ave/10M reads) and it takes 10-30 min to create any type of graphical output and even longer ~20-30min to export csv files of any graphs. We're working with version 3.6 but have had the same problem with all previous versions. I assume this is in part due to the depth of coverage we have but I'd like to rule out any problem with our workstation/install.
Leave a comment:
Roald replied

06-23-2009, 12:34 AM
To Lesley

Thanks for the info Lesley.
Do you happen to have a sample of some tagged Illumina data that I can get?
I basically just need a description of the format so just a few lines from the a file would suffice.

Cheers

Roald

Disclaimer: I work at CLC bio
Leave a comment:
Lesley replied

06-18-2009, 03:31 PM
Thanks again Roald,
We are going to try this workflow for our sequences and see how it goes.
The reason we are not using the pipeline for separation is that we had issues with version 1.3 and we have just received 1.4. We now have a script under 1.4 which will be used from now on but we will have to retro-fit for previously run data.
I tried separating on name and my system (with 8G RAM on 64 bit quad core) froze with one lane of data (3 indexes).
However, I am going to try again (after freeing up as much memory as possible) to see if it will work.
Cheers,
Lesley
Leave a comment:
Roald replied

06-16-2009, 03:40 AM
To Lesley

Disclaimer: I work at CLC bio
Hi Lesley,

It is correct that there are no options to sort tagged/barcoded Illumina PE data in our current "Multiplexing by tag" functionality. We designed this module to be used with 454 data and to be flexible enough to accommodate "home brew" multiplexing as is performed by a number of our users.
The reason that we did not focus on the indexed Illumina data is that Illuminas Pipeline software should be able to sort the tagged reads and append the barcode to the sequence name such that downstream analysis software, like ours, needs to address the naming conventions rather than the actual tag in the sequence. For this reason, we designed the a "Multiplexing by name" module that allows the user to sort reads based on naming conventions - see http://www.clcbio.com/index.php?id=1...nces_name.html

However, if the Pipeline sorting does not work or is not optimal we are off course grateful to know this so that we can elaborate on our current functionality such that Illumina PE data can also be sorted in our software and we are grateful to get your input on this. Could you let me know what your reason is for not using the Pipeline software to filter the reads ?

Regarding the mapping issue. We do not have any customized features for small RNAs but this is in our roadmap for this year. However, I think that our tools still should be applicable for a lot of small RNA related issues and hope that we can use your input to improve this.
Currently, the workflow in our software is such that when you perform mapping/reference assembly against a number of reference sequences, e.g the chromosomes of a reference genome, the program will output a number of contigs which represent the global alignments of the reads against the references. Your first problem is then that you would like to have the result as a tab-delimited file of the local alignments of reads against the references. Our cmd-line assembly program suite (NGS Cell) actually already offers this option - http://www.clcbio.com/index.php?id=1...e_Program.html and we have a plan to make this available in the workbench as well. It is really simple to do so, as all the information about the local alignment is also contained in the contig objects. Your reason for outputting the tab-delimited format is for viewing in gbrowser. However, until we have the tab-delimited export sorted, I would suggest that you could view the results in the contig objects inside the genomics workbench, which we in all modesty believe is a pretty powerful contig viewer.

For a "full" analysis workflow, I would suggest that you try something like this:
reference assemble your small RNA reads against the reference to produce full reference contigs

run the ChIp-seq analysis on the contig table/contigs but disable the read shifting and read orientation filters - this is basically using the module as a peak detector for regions enriched in small RNAs

use the chip-seq peak table to navigate the putative small RNA sites

potentiall, you can use the extract annotations function to extract all putative small RNA encoding regions to a sequence list that can then be exported to a miRNA detection software or whatever is relevant to your problem

I would be happy to hear how you get along and also happy to give this a go myself if I can get the data. Your input is much appreciated and I hope that we can keep the dialog open - you are also welcome to contact me in person - and see if we can't get you leaning the other way

Cheers

Roald

Disclaimer: I work at CLC bio
Leave a comment:
Lesley replied

06-15-2009, 03:07 PM
Thanks Roald,
Thanks for your quick reply. I am still waiting for a reply officially through the trial manager.

The multiplexing instructions specify restriction sites and tags for each end. Under Solexa sequencing the tag is read at the end of the first read. What would be extremely useful would be some instructions or tutorials explaining how to sort tags from Solexa PE indexed reads separate from those for 454 reads which is what is listed. Another major issue is how errors are taken into account for determining which index is which. The sequences are designed so that you can still determine indexes even with 2 errors but from the instructions it looks as if the CLC algorithm looks for perfect matches only. This is also 454 based and not appropriated for high throughput sequencing. We need illumina indexing instructions not the current ones that are for 454.

Now the definition of mapping - this is where you are NOT trying to assemble contigs. This is where the aim is to take a sequence and map its position on a reference genome. For instance, you have trimmed a small RNA sequence to 22-25 nt (the size for a potential miRNA) then you find its possible positions on the genome. Since the target sequence is smaller than one sequence assembly is not required. For longer RNAs that is cool but mapping will show these up just as well. Maq and soap do this well. The key output in this instance is a table of coordinates mapping the sequence to the reference genome. We then convert the output to gff and view in gbrowse. Please note that small RNA work is not mRNA-seq. They are totally different things. I am very interested to be able to link the mapping of the small RNAs to then folding and evaluating those foldings using CLC bio. However, the reference assembly algorithm tries to assemble into contigs and completely screws up the data. At present I hate to say CLC genomic workbench is not suitable for small RNA Illumina work. (now there is a challenge to your guys :-)
I suggest your development team take a complete newbe (with no 454 or Illumina or CLC experience), give them illumina data and let them tell you what is wrong with your documentation.
I am willing to work directly with you on this if you like and trial any improvements that are made. We are trialling this until the end of August when we are running a workshop on NGS. We have a reputation of being honest and brutal when it comes to the performance of software. At the moment we are tending towards the brutal but it would be nice to lean the other way.
Cheers and thanks again,
Lesley
Leave a comment:
Roald replied

06-15-2009, 02:38 AM
CLC Genomics 3.5

Disclaimer: I work at CLC bio
Hi Arne,

I have added some comments to your post here that I hope may be of use:
You are right that the Java side of our software uses a lot of memory. In order to utilize the full potential of the hardware and get things done as fast as possible we allow the program to use as much memory as is safe. This is done by checking the hardware specifications during startup.
If you are using the .sh installer the vmoptions should automatically be set to around 75%. However, if you think this is too much you can change the memory settings from the vmoptions file in the installation directory (e.g. clcgenomicswb3.vmoptions).

We have an ongoing effort to optimize our algorithms and data structures such that the software will run smoothly on even moderately equipped hardware and will fit the use case of doing big jobs on a large machine and then delegating the inspection to e.g. labtops.
On my MacBook Pro labtop I can quite comfortably view very large contigs of all human chromosomes. However, when the reference sequence of the contig is heavily decorated with annotations the machine can get a bit slow and unresponsive. This is something that we will address over the next couple of months as part of a major restructuring of our annotation handling framework. Stay tuned for that.

Regarding the missing search functionalities for RNA-seq results, we actually offer some quite advanced but also quite well hidden options for filtering and searching the result table (as well as most other tables). Please, have a look at http://www.clcbio.com/index.php?id=1...th_tables.html

I hope this helps, otherwise please get back here or try our support folks.

With best regards

Roald Forsberg
Director of Scientific Software Solutions, CLC bio

Disclaimer: I work at CLC bio
Leave a comment:
Roald replied

06-12-2009, 03:11 AM
Workbench issues

** Disclaimer: I work at CLC bio **

Hi Lesley,
I am sorry to hear that you have had some problems getting started with our workbench. I have added a few comments below that I hope are useful to you.

We strive to cater for data from all major platforms by e.g. having a dedicated short read assembler for Illumina/Helicos data and a dedicated color-space assembler for SOLiD data. But we can off course always get better at this, so I would be really grateful to learn which parts of the software you find too 454-orientated?

Regarding the indexed sequencing I would like to point you to our Multiplexing module - you can read more at http://www.clcbio.com/index.php?id=1...tiplexing.html and please let me know what you think since this is a feature that we review quite often to keep track with new sequencing protocols.

Regarding the mapping/assembly issue you raise and the comparison between CLC and other assemblers, I need a bit more info to give you a good answer. Could you tell me what your definition of mapping is, and how this differs from reference assembly and what your specific concern is with our algorithm comparisons?
Perhaps you would also be interested in reading some of our white papers on this issue at http://www.clcbio.com/index.php?id=1368 Please note that these algorithms are exactly the same as implemented in the Workbench even though the white papers pertain to the stand-alone command line software.

Better support for quantification and discovery of small RNAs is definitely something that we are working on improving. As you may have noticed, we have a full expression analysis package that allows downstream analysis of expression data. As of now this take input from analog expression arrays and digital RNA-seq data. As of next release it will also accept data from digital tag-based expression analysis and is our plan to extend this with expression data from small RNA quantification experiments as well.

Regarding the data import we have increased the speed quite dramatically recently, so I hope you will give the latest version a spin - see more at http://www.clcbio.com/index.php?id=1297

We have a bunch of tutorials lying around at http://www.clcbio.com/index.php?id=649 but unfortunately we do not have any for multiplexing yet - I will pass that to our documentation guys.

Do not hesitate to get back if there is more we can do to help you.

Best regards

Roald Forsberg
Director of Scientific Software Solutions, CLC bio.

** Disclaimer: I work at CLC bio **
Leave a comment:
Lesley replied

06-11-2009, 04:50 PM
Hi all,
I am just beginning to evaluate CLC Genomic Workbench for use with Illumina output and I am finding it so 454 orientated that it is driving me crazy with irrelevant instructions. Does anyone have any clear instruction on sorting Illumina based indexed sequencing?

The other question is - can we do real mapping with CLC or are we stuck with contig assembly (with or without reference). I do a lot of work with small ncRNAs and cannot find any tools in the trial that are remotely useful. I also find the comparison of their assembler with maq and soap a laugh, this is comparing an assembler with mappers.
I am working under an ubuntu 64 bit environment and the data loading of one lane of Paired End reads was extremely slow.

So far I feel the reality is not living up to the hype or maybe I am penalised for not working with human/mouse/rat resequencing data. Does anyone know if there are any tutorials on the NGS part of CLC bio that are relevent to indexed Illumina data or that from miRNAs?
By the way, thanks for the Velvet comparison. So far that has been the best de novo assembler for our group.

Cheers,
Lesley
Leave a comment:
arne.muller replied

06-10-2009, 05:06 AM
CLC Workbench 3.5

Hello,

I'm new to NGS and this list. So this is my first posting ... ;-)

I am testing the CLC Genomic Workbench 3.5 for our molecular biologists (our main users). I like the user interface, and the assembly against a reference genome/transcriptome is fast (comparable with bowtie - not arguing about minutes ...) and "only" consumes about 2Gb of memory.

Still the application is memory greedy - the assembler/mapper seems to be a stand alone binary program (C/C++?) that's called by the workbench, whereas the rest is java which consumes lots of memory (~30 Gb when loading 7mio Solexa reads in fastq format and the human RefSeq mRNAs as the reference).

I run the workbench on a 64 Gb Linux machine, but our end users only have small winXP workstations. Even if I did the assemblies and mappings for them, the resulting contig file is too large to load on any winXP machine (limited to <4 Gb of memory) for browsing. Anyway, there's probably a trick to split thing up ... (maybe RTFM helps ;-).

We're doing RNA-Seq (qualitative), and the main reasons why our biologists are interested in the workbench is to query for their favorite gene in the assembly and look how many reads align where - confirm the presence of transcripts and ultimately/hopefully work out tissue specific isoforms. However, for the moment the search capabilities in the workbench is not yet as good as I'd like to have it, e.g. the assembled contigs table does not allow to search for gene names even though the reference is RefSeq mRNA from gene bank with lots of annotation. I guess they're still improving this kind of functionality.

Has anybody experience using their Genomics Server in combination with the workbench? It's supposed to let users run the workbench as a client and let the assembly and mapping to be calculated on the server, but again loading the results into the client for browsing could still be a bottleneck.

Finally, what alternatives are there for browsing assembly/mapping results (when mapping to a reference genome) interactively and with some graphics, I mean for end users? I just read about MapView but haven't tested it yet.

regards,

Arne
Leave a comment:

Previous 1 2 3 4 5 template Next

Recent Developments in Metagenomics

by seqadmin

Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable¹. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
- Channel: Articles
09-23-2024, 06:35 AM
Understanding Genetic Influence on Infectious Disease

by seqadmin

During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
- Channel: Articles
09-09-2024, 10:59 AM

Topics	Statistics	Last Post
Mechanical Forces in DNA Transcription Uncovered by Clemson Researchers by seqadmin Started by seqadmin, 10-02-2024, 04:51 AM	0 responses 13 views 0 likes	Last Post by seqadmin 10-02-2024, 04:51 AM
New Epigenetic Clock Links Cheek Cells to Mortality Risk by seqadmin Started by seqadmin, 10-01-2024, 07:10 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-01-2024, 07:10 AM
AI-Powered Blood Test Shows Promise for Early Ovarian Cancer Detection by seqadmin Started by seqadmin, 09-30-2024, 08:33 AM	0 responses 25 views 0 likes	Last Post by seqadmin 09-30-2024, 08:33 AM
Stem Cell Research Suggests Human Cells May Enter Developmental Pause by seqadmin Started by seqadmin, 09-26-2024, 12:57 PM	0 responses 18 views 0 likes	Last Post by seqadmin 09-26-2024, 12:57 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News