Seqanswers Leaderboard Ad

**dariober** · 12-05-2013, 05:50 AM

Hi johanneskoester,

I'm curious about learning more about snakemake, thanks for positing it!

I'm quite familiar with python, R, bash but I'm not familiar at all with GNU Make and friends (other than executing it when I get some source code). So I must admit I fail to see where the advantage comes when building bioinformatics pipelines.

For example, following this example on snakemake, this is how I would implement the same pipeline:

Code:

REF="/global/home/users/ebolotin/scratch/hg19/hg19"
for fq in *.fastq.gz
do
    bname=`basename $fq .fastq.gz`
    ## Prepare pipeline
    echo "cutadapt  -m 10 -a AGATCGGAAGAGCACACGTCTGAACTCC -o ${bname}.cut $fq &&
    bowtie2 -p 20 --very-sensitive -x $REF -U ${bname}.cut -S ${bname}.sam &&
    makeTagDirectory ${bname}.tag ${bname}.sam -keepAll -genome hg19" > ${bname}.sh
    ## Run the job:
    bash ${bname}.sh
    # OR
    # nohup ${bname}.sh &
    # OR
    # bsub [opts] < ${bname}.sh
done

Could you point out in what respect snakemake would make it preferable?

Thanks!
Dario

**johanneskoester** · 12-05-2013, 07:18 AM

Hi,
sure. The solution you provide runs a for loop and spawns a bash job for each fastq.
So, they will execute in parallel, all fine.
When using Snakemake, you would have a similar effect at first sight. However, there are various advantages (some of them, but to the best of my knowledge not all, are also provided by other workflow systems):

With Snakemake, you can define how many processes should be active at the same time, so that your machine is not flooded with jobs. Snakemake will schedule them in a way such that the utilization of the provided cores will be maximized. The scheduling is also aware of the number of threads each job uses.
If a job fails, or you have to quit the execution, on the next invokation, Snakemake will determine what was already computed last time and only calculate the missing stuff.
If an input file happens to be changed, Snakemake will propose to rerun the subsequent part of the pipeline automatically (in other words, Snakemake automatically detects if one of your files is outdated).
You can run the very same workflow definition on a single machine or a cluster, without the need to redefine anything in the Snakefile.
Following the well known pattern of input-output-code, the Snakemake rules are very easy to read, and help to separate your commands from the parameters.
For each of the output files created during the workflow, Snakemake will store metadata like used parameters, commands, and input files etc. which is nice for documentation.

Best,
Johannes

**Physalia-courses** · 11-02-2018, 01:29 AM

Interested in learning more about #SNAKEMAKE?

Register now for the first 2-day #SNAKEMAKE Workshop in Berlin with Johannes Köster https://johanneskoester.bitbucket.io/

Snakemake: Reproducible and Scalable Bioinformatic Workflows

https://www.physalia-courses.org/courses-workshops/course41/

Dates 3-4 May 2022 This course will be held online in response to the coronavirus outbreak

You will learn how to create modern and reproducible #bioinformatic workflows

Topics	Statistics	Last Post
Mechanical Forces in DNA Transcription Uncovered by Clemson Researchers by seqadmin Started by seqadmin, 10-02-2024, 04:51 AM	0 responses 13 views 0 likes	Last Post by seqadmin 10-02-2024, 04:51 AM
New Epigenetic Clock Links Cheek Cells to Mortality Risk by seqadmin Started by seqadmin, 10-01-2024, 07:10 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-01-2024, 07:10 AM
AI-Powered Blood Test Shows Promise for Early Ovarian Cancer Detection by seqadmin Started by seqadmin, 09-30-2024, 08:33 AM	0 responses 25 views 0 likes	Last Post by seqadmin 09-30-2024, 08:33 AM
Stem Cell Research Suggests Human Cells May Enter Developmental Pause by seqadmin Started by seqadmin, 09-26-2024, 12:57 PM	0 responses 18 views 0 likes	Last Post by seqadmin 09-26-2024, 12:57 PM

Seqanswers Leaderboard Ad

Announcement

A new release of the Snakemake workflow system

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News