Seqanswers Leaderboard Ad

**JackieBadger** · 09-21-2013, 08:35 AM

It isn't worth paying for. You can do everything and more with free software.

**GenoMax** · 09-22-2013, 07:11 PM

Commercial software has its place and applications (otherwise such companies would not exist). I know people who are extremely satisfied because CLC is able to easily do what they need for their projects but then I also know of others who have experiences that is similar to yours.

No one can give you a good reason to pay for any commercial software without a complete understanding of your exact requirements/expectations (which would be hard to do via a forum). You need to make the purchase (or not) decision based on the best data you have on hand.

**HenrivdGeest** · 09-22-2013, 11:15 PM

like the posters before me, you don't HAVE to buy it

For me and for some colleagues CLC really is accelerating research. Its mostly the visualisation part that helps. I, as a bioinformatician usually use it for some quick and dirty testing, as well as presenting data to other people within our organisation. With CLC we (more than once) encountered some problems in our data just because everything is visualized. If you don't know where to look for, this is quite handy.

But, I also encounter a lot of annoying things/bugs, but I think because its payed software, people tend to get easily angry about it. For people(all researchers, non bioinformaticians) who don't know yet about CLC, I always say give it a try, and you will see if you like it or not.

And yes, everything could also be done by freeware/opensource software, and personally I always choose between CLC and some other software. Sometimes CLC is handy, sometimes commandline.

**Zigster** · 09-23-2013, 09:47 AM

One thing that I've learned from all commercial bioinformatics software (not just CLC - they are probably the best of the bunch):

It is really hard to build software that makes difficult things easier.

At some point a scientist is just going to have to learn how to code - there are just too many ways for an analysis to go off the beaten path and break a pre-packaged, canned, off-the-shelf suite.

**jkbonfield** · 09-24-2013, 05:10 AM

Originally posted by yaximik View Post

For example, why I have to waste the storage space to import gigabytes of my existing databases to create yet another database in a program-specific format that cannot be read by other software? Convenience of associated metadata? OK, but why this convenient format does not allow a reference assembly against selected genome regions, while open-source software allows that using less convenient data formats? No, you have to use the entire genome - it will be slow though. Nice.

Not that I want to defend CLC, and I've no idea precisely why they do this either, but I am also guilty of using my own internal format for Gap5 requiring a full import. Why? Because BAM isn't a good format for editing.

Imagine a scenario where you are working on a denovo assembly and you need to join two contigs together containing 20 million sequences each. For BAM the resulting contig will require updating the position of 20 million sequences, possibly reverse complementing 20 million too if that's what the match indicated. Or... you could use a format that stores all data in a recursive R-Tree that requires only a handful of updates to achieve the same task. [1]

This is for Gap5 which does denovo editing work (and is pretty rubbish for reference based annotation tasks instead). I don't know how that relates to CLC, but sometimes there can be good programmer justification for doing something that seems daft.

James

[1] With hindsight I should have invented an overlay system that allows BAM to be used as the backend and only import and indexing system to permit rapid movement and restructuring of data. But that's another level of complexity.

**mcnelson.phd** · 09-24-2013, 06:09 AM

We've been using CLC heavily for a few years now, and while it's nowhere near perfect, it does have its merits.

Firstly, having a GUI interface allows people who are new to bioinformatics to quickly get started with analyzing data without having to wade into command line processing. I know that using command line programs aren't very difficult, but for most people it's very intimidating when they first start and can take quite a while before they fully understand some of the esoteric error messages or faults that can occur.

Also, there are a number of people who don't want to get that deeply involved in bioinformatics, they just want to analyze their data quickly so they can move their experiments along. For these people, CLC offers a convenient package that lets them do nearly all standard processing methods without having to get bogged down in a lot of details. It's a valid argument that I've given before that if you really want to do good work then you should have a good idea of how the program you use works, but the reality is that most people just want to know the end result and not how the sausage is made.

Second, often times configuring the software for your particular system is not trivial, and CLC provides a multitude of tools all in one complete package that will pretty much work without fail across all three major operating systems. For labs or research groups that have a mix of different computing systems, having one piece of software that looks and acts the same straight out of the box makes it easier to move data around and let people interact. Yes there are non-commerical tools that can achieve the same thing, like Galaxy for instance, but generally they're not as powerful or are more complex to set up and use. Also, for labs that only use Windows, many command line programs are unavailable to them or require a lot more configuration than on Mac or Linux systems. Since Windows is still the dominant OS, particularly because of Office, CLC offers a solution for data analysis that may not be available otherwise.

Third case, because everything is provided in a single package, you have the ability to track how your data was manipulated and can trace back from an analysis file to the original read data. This is something that command line programs don't offer unless you take very good notes or create your own processing logs as to what files were input into a program and what the outputs were. This is particularly useful in situations where you process something multiple ways to see what effect different types of options have on the result. This tracking is also very useful for cases where someone who's left the lab has their data in CLC, and someone new to the lab has to take parts of their data and do something else, which happens quite frequently in a lot of labs.

Now, saying all of that, I do have my fair share of complaints about CLC, and if it were just me I wouldn't consider it worth it. The only commercial software that I purchased using my own funds was Geneious, because it's much better than CLC at doing a lot of the simple sequence and genome editing that I prefer to do with a GUI based program (it's also a heck of a lot cheaper). Outside of that, I mostly use command line programs as I prefer that greater level of control, but then again I also have more experience doing such things than everyone else in my lab, so while that works for me it doesn't work for them.

Bottom line is, CLC has its merits, but based on your rant it seems like you'd rather stick with command line tools. If that's the case, then that's fine, but no one is forcing you to buy CLC or any other commercial software package.

**sklages** · 09-24-2013, 10:38 AM

I totally agree with jkbonfield and mcnelson.phd.

.. as a sidenote, I never found any weird characters like @, F or Q in my Illumina fastq sequence portions ...

**newbietonextgen** · 09-24-2013, 01:03 PM

Hi

I have been using CLC for sometime and was wondering if any one has compared metrics between CLC and other aligners.

I found something interesting and wanted to know if anybody has observed it. We took some RNA-seq data, 100bp paired end reads, and aligned it using the latest CLC and Tophat.

We then took 10,000 bp region from both the BAM files and looked at number of reads aligned and the accuracy's of the alignment.

So far, CLC aligns more reads to the same region compared to tophat (11,200/3500). Now coming to the big question of accuracy, we found twice the number of pairs in CLC than tophat (3346 vs 1640 pair). So the question is how is CLC doing it? Mind you it's only one region..

Can test people can suggest that would be very comprehensive.

cheers
newbie

**mcnelson.phd** · 09-24-2013, 02:43 PM

Originally posted by newbietonextgen View Post

Hi
So far, CLC aligns more reads to the same region compared to tophat (11,200/3500). Now coming to the big question of accuracy, we found twice the number of pairs in CLC than tophat (3346 vs 1640 pair). So the question is how is CLC doing it? Mind you it's only one region..

CLC put out a white paper not too long ago (past year around when version 6 was released if I remember correctly) that detailed how their read mapper was more accurate and able to map more reads than bowtie and bwa. I never delved into the details, but I can also attest to the fact that CLC does map more reads to a reference sequence than bowtie/bowtie2. In many cases, I find that this is because the reference is circular and bowtie doesn't seem to handle that case very well. They may also have a more greedy algorithm, although that doesn't appear to be the case entirely. Either way, your findings are correct in that CLC maps more reads... the question still may be whether or not they're all mapped accurately?

**chadn737** · 09-24-2013, 03:49 PM

Originally posted by mcnelson.phd

Also, there are a number of people who don't want to get that deeply involved in bioinformatics, they just want to analyze their data quickly so they can move their experiments along. For these people, CLC offers a convenient package that lets them do nearly all standard processing methods without having to get bogged down in a lot of details. It's a valid argument that I've given before that if you really want to do good work then you should have a good idea of how the program you use works, but the reality is that most people just want to know the end result and not how the sausage is made.

There is a real danger, particularly when you combine an attitude of "just give me the end result" and easy to use software, of doing it wrong. A lot of people think that simply because they can do it in a program and get a result, that the result must therefore be right. At least when one is forced to learn something about the program, they may be forced to think more critically about it or at least seek out advice from those who do.

**newbietonextgen** · 09-24-2013, 06:18 PM

Originally posted by mcnelson.phd View Post

CLC put out a white paper not too long ago (past year around when version 6 was released if I remember correctly) that detailed how their read mapper was more accurate and able to map more reads than bowtie and bwa.

I will look into the white paper. Is there a way to look into accuracy of alignment, as far as metrics etc. Any suite that can be used or work flow..

Thanks
newbie

**mcnelson.phd** · 09-25-2013, 04:21 AM

Originally posted by newbietonextgen View Post

I will look into the white paper. Is there a way to look into accuracy of alignment, as far as metrics etc. Any suite that can be used or work flow..

To find the white paper, just google "CLC read mapping white paper", it should come up as the first thing.

I don't know off the top of my head of any good single metric to assess accuracy because that requires knowing where the reads should map to. In most cases, looking at the number of multiply mapped reads and the number of differences between the reads and the consensus may give a good indicator of quality, but only if you know there are no repetitive elements and no sequence variants between the reads and the reference. Sequencing noise would complicate things, because in some cases you might rather have noisy reads not mapped than mapped if you're trying to find something like low frequency variants. It's a bit like trying to assess how good an assembly is, you can use the N50 value, but that really doesn't tell you that much and may be misleading...

**rhinoceros** · 09-25-2013, 05:28 AM

To be honest, when it comes to bioinformatics, I think all GUI-driven programs suck in comparison to command line alternatives (think e.g. parallelization, piping output from one program into another, and handling of million row tables). I understand the value of e.g. Geneious for people who can't be bothered to learn how to function at the command line, but then, those people aren't very serious bioinformaticians to begin with.

**mcnelson.phd** · 09-25-2013, 05:43 AM

Originally posted by rhinoceros View Post

To be honest, when it comes to bioinformatics, I think all GUI-driven programs suck in comparison to command line alternatives (think e.g. parallelization, piping output from one program into another, and handling of million row tables). I understand the value of e.g. Geneious for people who can't be bothered to learn how to function at the command line, but then, those people aren't very serious bioinformaticians to begin with.

That's a very ignorant position to take. Simply having a GUI front end to make working with and analyzing data easier doesn't make it less complex or powerful. Do you use a GUI based operating system? If so then your comments can't be taken seriously because it's the same difference. Command line programs are great, but they're not perfect simply because they don't have a GUI and are harder to use.

Further, would you say that something like IGV sucks because it provides a GUI interface for looking at mapping files? Where do you draw your limits, if it's a commercial piece of software then it must be bad? As I said earlier, programs like CLC and others can make it too easy for people to do bad analyses, but that's not the fault of the program as there are a lot of good studies that are done using CLC. In fact, it's probably more likely for someone to do bad science with command line programs that aren't very user friendly and have incomplete or incomprehensible documentation. The fact is that high throughput sequencing has become a standard tool like Sanger sequencing before it, and that means a lot more labs and people will be working with such data in the future. It's incumbent upon those of us who are good bioinformaticians to help design and provide tools that allow these newcomers to analyze their data accurately and reliably, and that's what CLC tries to do. You don't blame a car manufacturer for people being bad drivers, so don't do the same with bioinformatics tools.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

CLC workbench sucks ...

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News