Skip to main content

Snakemake Basics

Snakemake is a workflow management tool that lets you define analysis pipelines using a Python-based language. On ASU Supercomputers, Snakemake submits each rule as a sbatch job to the scheduler.

The key idea is to run your main Snakemake process on minimal resources with a long walltime, while the child jobs it spawns request only what they need for short durations.

Main Snakemake Job

Submit your Snakemake controller as a lightweight, long-running job. It doesn't do heavy computation — it just monitors and submits child jobs.

#!/bin/bash
#SBATCH --job-name=snakemake_main
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=7-00:00:00
#SBATCH --partition=long

module load snakemake

snakemake --jobs 50 --cluster "sbatch --cpus-per-task={threads} --mem={resources.mem_mb}M --time={resources.runtime} --partition=short"

Child Job Resources

Define resources per rule in your Snakefile so each child job is short and lean. This helps your jobs pass through the scheduler faster and works in your favor for fairshare scheduling.

rule align:
input: "reads/{sample}.fastq"
output: "aligned/{sample}.bam"
threads: 4
resources:
mem_mb=8000,
runtime="02:00:00"
shell:
"bwa mem -t {threads} ref.fa {input} | samtools sort -o {output}"

Cluster Profile

For cleaner configuration, create a Snakemake profile under ~/.config/snakemake/cluster/config.yaml to set default and per-rule resource values instead of putting everything on the command line.

Tips

  • Keep child jobs short. Shorter walltimes improve queue priority under fairshare and start sooner.
  • Request only the resources you need. Over-requesting memory or CPUs wastes your fairshare allocation.
  • Use --rerun-incomplete if your main job times out so Snakemake picks up where it left off.
  • Snakemake is considered an advanced use case, please contact the Research Computing team for support if needed.