Easy Slurm
Contents
Easy Slurm#
Easy Slurm allows you to easily manage and submit robust jobs to Slurm using Python and Bash.
Features#
Freezes source code by copying to separate
$JOB_DIR
.Auto-submits another job if current job times out.
Exposes hooks for custom bash code:
setup
/setup_resume
,on_run
/on_run_resume
, andteardown
.Format job names using parameters from config files.
Interactive jobs supported for easy debugging.
Installation#
pip install easy-slurm
Usage#
To submit a job, simply fill in the various parameters shown in the example below.
import easy_slurm
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date}-{job_name}",
src=["./src", "./assets"],
setup="""
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
""",
setup_resume="""
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
""",
on_run="cd src && python main.py",
on_run_resume="cd src && python main.py --resume",
teardown="""
# Do any cleanup tasks here.
""",
sbatch_options={
"job-name": "example-simple",
"account": "your-username",
"time": "3:00:00",
"nodes": "1",
},
resubmit_limit=64, # Automatic resubmission limit.
)
All job files will be kept in the job_dir
directory. Provide directory paths to src
– these will be archived and copied to the job_dir
directory. Also provide Bash code in the hooks, which will be run in the following order:
First run: |
Subsequent runs: |
---|---|
|
|
|
|
|
|
Full examples are available, including a simple example to run “training epochs” on a cluster.
YAML#
Jobs can also be fully configured using YAML files. See examples/simple_yaml.
job_dir: "$HOME/jobs/{date}-{job_name}"
src: ["./src", "./assets"]
setup: |
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
on_run: "cd src && python main.py"
on_run_resume: "cd src && python main.py --resume"
teardown: |
# Do any cleanup tasks here.
sbatch_options:
job-name: "example-simple"
account: "your-username"
time: "3:00:00"
nodes: 1
resubmit_limit: 64 # Automatic resubmission limit.
Formatting#
One useful feature is formatting paths using custom template strings:
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date:%Y-%m-%d_%H-%M-%S_%3f}-{job_name}",
)
The job names can be formatted using a config dictionary:
easy_slurm.submit_job(
sbatch_options={
"job-name": "bs={hp.batch_size:04},lr={hp.lr:.1e}",
# Equivalent to:
# "job-name": "bs=0032,lr=1.0e-02"
},
config={"hp": {"batch_size": 32, "lr": 1e-2}},
)
This helps in automatically creating descriptive, human-readable job names.
For the CLI / YAML interface, the same can be achieved using the –config argument:
easy-slurm --job="job.yaml" --config="config.yaml"