Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster Flow: A pipelining tool to automate and standardise bioinformatics analyses

    Hi all,

    We've just released a new piece of software from the Babraham Bioinformatics group called Cluster Flow.

    Cluster Flow is a command-line program which uses GRIDEngine or LSF cluster environments to run analysis pipelines.
    • Routine analyses are very quick to run, for example: cf --genome GRCh37 fastq_bowtie *fq.gz
    • Pipelines use identical parameters, standardising analysis and making results more reproducable
    • Integrated parallelisation tools help prevent your cluster becoming overloaded
    • All commands and output is logged in files for future reference
    • Intuitive commands and a comprehensive manual make Cluster Flow easy to use
    • Works out of the box (almost - see the YouTube tutorial)


    How Cluster Flow differs from other pipeline tools:
    • Very lightweight and flexible
    • Pipelines and configurations can easily be generated on a project-specific basis if required
    • New modules and pipelines are very easy to write (see video tutorial)


    We have been using Cluster Flow on our GRIDEngine software for some months and it's working well. In fact, I think it's fair to say that most of our bioinformatics group use it on an almost daily basis now. There has been limited testing on LSF systems with the help of a friend at the EBI, where it seems to work ok.

    At the time of writing, Cluster Flow comes bundled with pipelines and modules to run the following programs:

    It comes with typical pipelines to process data using these modules, some with additional parameters (eg. for miRNA alignment or RRBS methylation data).

    We've written these pipelines as we've needed them - Cluster Flow comes with an example module which you can use to help you write your own. If you do use Cluster Flow and write any new modules or pipelines, please let us know as we're keen to expand the number of available analyses that it can run.

    Cluster Flow is released with a GPL v3 licence and can be downloaded from the Babraham Bioinformatics website: http://www.bioinformatics.babraham.a...s/clusterflow/

  • #2
    Hi all,

    I've just released version 0.2 of Cluster Flow. The main update is that it now supports SLURM clusters, plus it's much easier to customise the job submission commands to be tailored to your environment.

    Cluster Flow now has its own website for documentation: http://ewels.github.io/clusterflow/

    It's now hosted on GitHub - you can download v0.2 from tagged releases page.

    Cheers,

    Phil

    Comment


    • #3
      Version 0.3 of Cluster Flow has just been pushed live.

      This one has been brewing for a few months now and is a big update. The main highlights:
      • Report log files are now handled in a clever way to keep their order consistent, even when jobs are running in parallel.
      • E-mails are fancier and flag any errors or warnings, plus they can be given custom text strings to search for in the logs and highlight or flag as warnings.
      • Environment module loading has been tidied up and now needs less configuration and works more robustly. Environment modules can now be given aliases for better compatibility and version specification.
      • Cluster compatibility has been developed heavily and now allows almost complete configuration of the job submission commands via the configuration file.


      You can download v0.3 of Cluster Flow here: https://github.com/ewels/clusterflow/releases/tag/v0.3

      Documentation and new demonstrations can be seen on the docs homepage: http://ewels.github.io/clusterflow/

      Much of this development has been the result of me moving and wanting to run Cluster Flow on a different cluster. I'd like to thank those who have helped out with testing and development, notably the chaps back at Babraham who have had to put up with all of my buggy pre-releases.

      Phil

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Best Practices for Single-Cell Sequencing Analysis
        by seqadmin



        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
        Yesterday, 07:15 AM
      • seqadmin
        Latest Developments in Precision Medicine
        by seqadmin



        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

        Somatic Genomics
        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
        05-24-2024, 01:16 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 06:58 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:18 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:04 AM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-03-2024, 06:55 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Working...
      X