Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CLC Genomics Workbench - Windows vs. Linux

    Hello everyone. I'm a bioinformatics student from Holland and my internship supervisor just told me he's thinking about ordering a license for CLC Genomics Workbench. He asked me if analyses would run much faster if he'd run it on Linux. I know Linux can be much faster in some situations (e.g. web servers), but I have no idea when it comes data analyses with tools like this.

    Do any of you have experience with this? Does Linux have advantages / disadvantages over Windows when it comes to do data analysis with CLC Genomics Workbench (or similar tools)? And if Linux would be significantly faster, would that mean we could purchase a computer with less RAM to save costs?

  • #2
    I have Clc for Linux and Windows, but have never benchmarked on the same machines. My feeling is that Linux would not be that much faster, if at all.
    Linux might be more efficient with memory and stress the machine a little less.

    Linux has the huge advantage for bioinformatics in that most tools are written for it.

    Comment


    • #3
      Hoi figure002,

      We run CLC (and more) on an Ubuntu 10.8 x86_64 server with 24 cores and some 47G ram. I am not a CLC user so I can't really give you more details on its performance. People here say that it has trouble with the HiSeq data they feed it because it's just too much, despite the server. They then try to align their reads on one chromosome instead of the entire reference, which I think introduces false positives.

      I'd think that the amount of memory is more important than the OS.

      Cheers

      Comment


      • #4
        Yes, the amount of RAM is the important thing. You need a minimum of 16GB and more is better.

        Comment


        • #5
          Thanks for the replies guys. I mailed the guys at CLC bio as well, and got a similar reply:

          "The OS does not matter with regards to speed. It is based on the number of CPU's and RAM of the machine."

          So the OS doesn't really matter. I'm sure there are differences in performance, but those are probably minimal. So he's probably going to stick with Windows, since that's what he's used to.

          (This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..)

          Comment


          • #6
            Originally posted by figure002 View Post
            This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..
            Welcome to the wonderful world of NGS!

            If you don't have one of those and buying is no option, there are compute clusters out there. Check for example SARA or ask the NBIC for information. You're not the only one dealing with large quantities of data and expensive computations in our region :P

            ps I'm also interested in your status as 'bioinformatics student': HBO or master's internship? Which uni? I've barely lost the bioinformatics student status myself...
            Last edited by Bruins; 01-26-2011, 08:47 AM.

            Comment


            • #7
              Ahh, computer clusters, that's probably one of the things I'll learn about in my specialisation "high throughput" which starts in about 2 weeks. I just finished my internship with an awesome grade.

              PS. I'm a junior at the Leiden University of Applied Sciences (Hogeschool Leiden) and I'm working towards my bachelor's degree. Can't wait to finally get started and earn some money. Where did you study?

              Comment


              • #8
                Hi figure002,

                I wanted to mention that DNASTAR has a new version of SeqMan NGen that does very fast assemblies of any size genome on a desktop computer. (Bacterial genomes < 1 minute; the whole human genome in <24 hrs).

                If you are interested in learning more, you can check out our website, or message me and I can arrange for a free trial of the software.

                Thanks,
                Anne

                Comment


                • #9
                  @figure002: I PBed you to avoid slow chat in this thread

                  Comment


                  • #10
                    Originally posted by DNASTAR View Post
                    Hi figure002,

                    I wanted to mention that DNASTAR has a new version of SeqMan NGen that does very fast assemblies of any size genome on a desktop computer. (Bacterial genomes < 1 minute; the whole human genome in <24 hrs).
                    This is perhaps a bit misleading as the website claims "Reference-Guided Human Genome Assembly". Is that basically mapping a la bwa, bowtie, etc, or an actual assembly?

                    I can see hybrids existing too, which some groups already do. Map the bits that map and then try to extend the bits which don't to identify insertion sequence, and possibly then have a basic denovo assembly algorithm for the rest (but acknowledge it'll most likely be very short contigs).

                    Comment


                    • #11
                      Originally posted by figure002 View Post
                      Thanks for the replies guys. I mailed the guys at CLC bio as well, and got a similar reply:

                      "The OS does not matter with regards to speed. It is based on the number of CPU's and RAM of the machine."

                      So the OS doesn't really matter. I'm sure there are differences in performance, but those are probably minimal. So he's probably going to stick with Windows, since that's what he's used to.

                      (This is actually the first time I hear about machines with 24 cores and 47G ram. I didn't know such things existed..)

                      Perhaps the OS doesn't matter too much with CLCBio, but I'd stick to Linux since many or even most of the programs in NGS are designed for and tested primarily on Linux.
                      Also the Linux command line allows easy access to sequence files, which Windows fails miserably at.

                      Comment


                      • #12
                        Originally posted by jkbonfield View Post
                        This is perhaps a bit misleading as the website claims "Reference-Guided Human Genome Assembly". Is that basically mapping a la bwa, bowtie, etc, or an actual assembly?

                        I can see hybrids existing too, which some groups already do. Map the bits that map and then try to extend the bits which don't to identify insertion sequence, and possibly then have a basic denovo assembly algorithm for the rest (but acknowledge it'll most likely be very short contigs).
                        SeqMan NGen generates a fully gapped assembly. This benchmark time for human genome assembly also includes full SNP statistical analysis to the entire dbSNP data base. The output from SeqMan NGen is a BAM file plus accessory files that provide SNP, coverage and feature information that are important for downstream analysis.

                        Comment


                        • #13
                          I suppose you take "assembly" to mean "mapping to a reference", otherwise a BAM file as output wouldn't make any sense.

                          I prefer the term mapping or alignment, as "assembly" should be reserved for the reconstruction of a genome without a reference. (or perhaps "reference-guided assembly", but then you would expect FASTA files as output, not BAM)

                          Comment


                          • #14
                            Originally posted by kopi-o View Post
                            I suppose you take "assembly" to mean "mapping to a reference", otherwise a BAM file as output wouldn't make any sense.

                            I prefer the term mapping or alignment, as "assembly" should be reserved for the reconstruction of a genome without a reference. (or perhaps "reference-guided assembly", but then you would expect FASTA files as output, not BAM)
                            Yes, this is an alignment but is not simple mapping to a human genome reference sequence. Algorithms like Bowtie map reads to the reference genome and produce an ungapped BAM file, where the reference sequence cannot be gapped to accept variations. SeqMan NGen creates a gapped BAM file perfectly suitable for SNP variation analysis. Also the SeqMan BAM viewer can display the gapped alignment and easily navigate the genome and variation report. Other BAM viewers (like Tablet) do not display reference gaps, so insertions are missing from the alignment views, and are not suitable for variation analysis.

                            Comment


                            • #15
                              Personally I'm happy for BAM to be used as an alignment output format too - it certainly makes sense and isn't only to be reserved for mapping. The logical approach to this is to use the contig consensus sequences in place of the references.

                              You're right that many mapped alignment viewers do a dismal job of displaying indels (even tview in some cases). For now this appears to be more in the domain of assembly editors. I'm biased of course, but gap5 can handle such things and no doubt CLC's and DNASTAR's own tools too.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-03-2024, 09:45 AM
                              0 responses
                              201 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-03-2024, 08:54 AM
                              0 responses
                              210 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-02-2024, 03:00 PM
                              0 responses
                              192 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X