Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A new release of the Snakemake workflow system

    Hi guys,
    I would like to announce version 2.4.8 of Snakemake.
    Snakemake is a pythonic text-based workflow system with a clean and easy to read language for defining your workflows. Snakemake is inspired by GNU Make. Workflows are defined by rules, that generate output files from input files. Rule dependencies and parallelization are automatically determined by Snakemake.
    In contrast to GNU Make, Snakemake allows to have multiple output files in a rule. Further, rules can use shell commands, Python, or R code. Snakemake provides many additional useful features like resource-aware scheduling, parameter- and version-tracking and detection of incomplete files.
    Finally, Snakemake has a generic cluster support, that works with any cluster or batch system that provides a qsub-like command and a shared filesystem.

    To give you an impression, this is how Snakemake rules look like:
    Code:
    rule targets:
        input:  'plots/dataset1.pdf', 
                'plots/dataset2.pdf'
    
    rule plot:
        input:  'raw/{dataset}.csv'
        output: 'plots/{dataset}.pdf'
        shell:  'somecommand {input} {output}'
    If you like Snakemake, please feel free to visit http://bitbucket.org/johanneskoester/snakemake.

  • #2
    Hi johanneskoester,

    I'm curious about learning more about snakemake, thanks for positing it!

    I'm quite familiar with python, R, bash but I'm not familiar at all with GNU Make and friends (other than executing it when I get some source code). So I must admit I fail to see where the advantage comes when building bioinformatics pipelines.

    For example, following this example on snakemake, this is how I would implement the same pipeline:

    Code:
    REF="/global/home/users/ebolotin/scratch/hg19/hg19"
    for fq in *.fastq.gz
    do
        bname=`basename $fq .fastq.gz`
        ## Prepare pipeline
        echo "cutadapt  -m 10 -a AGATCGGAAGAGCACACGTCTGAACTCC -o ${bname}.cut $fq &&
        bowtie2 -p 20 --very-sensitive -x $REF -U ${bname}.cut -S ${bname}.sam &&
        makeTagDirectory ${bname}.tag ${bname}.sam -keepAll -genome hg19" > ${bname}.sh
        ## Run the job:
        bash ${bname}.sh
        # OR
        # nohup ${bname}.sh &
        # OR
        # bsub [opts] < ${bname}.sh
    done
    Could you point out in what respect snakemake would make it preferable?

    Thanks!
    Dario

    Comment


    • #3
      Hi,
      sure. The solution you provide runs a for loop and spawns a bash job for each fastq.
      So, they will execute in parallel, all fine.
      When using Snakemake, you would have a similar effect at first sight. However, there are various advantages (some of them, but to the best of my knowledge not all, are also provided by other workflow systems):

      With Snakemake, you can define how many processes should be active at the same time, so that your machine is not flooded with jobs. Snakemake will schedule them in a way such that the utilization of the provided cores will be maximized. The scheduling is also aware of the number of threads each job uses.
      If a job fails, or you have to quit the execution, on the next invokation, Snakemake will determine what was already computed last time and only calculate the missing stuff.
      If an input file happens to be changed, Snakemake will propose to rerun the subsequent part of the pipeline automatically (in other words, Snakemake automatically detects if one of your files is outdated).
      You can run the very same workflow definition on a single machine or a cluster, without the need to redefine anything in the Snakefile.
      Following the well known pattern of input-output-code, the Snakemake rules are very easy to read, and help to separate your commands from the parameters.
      For each of the output files created during the workflow, Snakemake will store metadata like used parameters, commands, and input files etc. which is nice for documentation.

      Best,
      Johannes

      Comment


      • #4
        Interested in learning more about #SNAKEMAKE?

        Register now for the first 2-day #SNAKEMAKE Workshop in Berlin with Johannes Köster https://johanneskoester.bitbucket.io/

        Dates 3-4 May 2022 This course will be held online in response to the coronavirus outbreak


        You will learn how to create modern and reproducible #bioinformatic workflows

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin







          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has...
          12-02-2024, 01:49 PM
        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-02-2024, 09:29 AM
        0 responses
        144 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-02-2024, 09:06 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-02-2024, 08:03 AM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-22-2024, 07:36 AM
        0 responses
        70 views
        0 likes
        Last Post seqadmin  
        Working...
        X