Building software with Slurm jobs using EasyBuild
This blog post serves as inspiration, and aims to get you started running jobs based on some of the stuff I run on my clusters.
Do you want to conveniently build modules in a hurry? Then you will love --job combined with -r.
What it does
By specifying eb -r --job example.eb easybuild will:
- Resolves all dependencies
- Pre-fetch all the sources
- Submit jobs for each easyconfig that needs to be built
- Jobs have dependency tracking based on their dependencies
And then you'll see something like:
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
9015294 vera Example-1 you PD 0:00 1 (Dependency)
9015292 vera Stuff-1. you R 0:06 1 node-01
9015293 vera Thing-2. you R 0:06 1 node-02
You can specify --job-cores=X when using --job to specify the size of your job.
Passing Slurm flags
What if your job needs a GPU partition, or specify a TRES?
Set them via environment variables:
SBATCH_GPUS_PER_NODE=H100:1 eb -r --job-cores=8 example-CUDA.eb
In fact, why not make an alias for it?
alias buildH100='SBATCH_GPUS_PER_NODE=H100:1 eb -r --job'
Depending on your cluster, you may find the following environment variables useful:
$SBATCH_GPUS_PER_NODE
$SBATCH_CONSTRAINT
$SBATCH_PARTITION
$SBATCH_ACCOUNT
But of course, many more exist (see also the sbatch documentation).
Collect build logs
If you don't want job logs in your current working directory when you submit, collect them to one place using --job-output=/path/to/logs .
If you have multiple cluster or architectures, you may wish to place them in specific locations.
Remember that you can set this via environment variables as well, for example:
export EASYBUILD_JOB_OUTPUT_DIR=$HOME/log_${CLUSTER_NAME}_${ARCH}/
Debug failed builds
If you have a problem build and the job logs isn't that useful, you can specify that the temporary directory
that EasyBuild creates should be on persistent storage:
--tmpdir=/your/path/on/central/storage/eb-tmp
Simple alias to show job queue
A handy alias to nicely keep track of your job queue can be made:
alias q='squeue --me -O jobid:10,tres-per-node:18,name:60,TimeUsed:10,reasonlist'
alias wq='watch -c \"squeue --me -O jobid:10,tres-per-node:18,name:60,TimeUsed:10,reasonlist\"'
Colorful alias to show job queue
Using bat, we can even color the output nicely:
alias wq="watch -c \"squeue -u c3-builder -O jobid:10,tres-per-node:18,name:60,TimeUsed:10,reasonlist | sed 's/^ *//g' | sed 's/ \+/,/g' | bat -f -l csv --style plain --theme=ansi | column -s, -t | bat --style grid\""
Advanced alias to show job queue
Why not look at sacct as well? And color based on state?
Here's an example Python script that does exactly that:
| #!/usr/bin/env python3
import subprocess
from datetime import datetime, timedelta
from rich.console import Console
from rich.table import Table
SACCT_LIMIT = timedelta(hours=12)
def run(cmd):
return subprocess.check_output(cmd, text=True).strip().splitlines()
def parse_table(lines):
header = lines[0].split("|")
rows = []
for line in lines[1:]:
cols = line.split("|")
rows.append(dict(zip(header, cols)))
return rows
def parse_gpu(tres_str):
tres = [x.split('=')[0] for x in tres_str.split(',')]
for t in tres:
if 'gpu' in t:
return t.split(':')[1]
return ''
def dhms_to_hms(dhms: str) -> str:
"""
Convert Slurm elapsed time D-HH:MM:SS or HH:MM:SS into HHH:MM:SS
"""
if '-' in dhms:
days, hms = dhms.split('-', 1)
hours, minutes, seconds = map(int, hms.split(':'))
total_hours = int(days) * 24 + hours
return f"{total_hours:3}:{minutes:02}:{seconds:02}"
else:
return dhms
hours, minutes, seconds = map(int, dhms.split(':'))
total_hours = hours
return f"{total_hours:3}:{minutes:02}:{seconds:02}"
def fix_path(path, jobid) -> str:
path = path.replace('/cephyr/users/c3-builder/Vera', '~')
path = path.replace('/cephyr/users/c3-builder/Alvis', '~')
return path.replace('%j', jobid)
def get_squeue():
cmd = [
"squeue",
"--me",
"-o", "%i|%j|%T|%M|%R|%f|%b"
]
lines = run(cmd)
return parse_table(lines)
def get_sacct():
start = (datetime.now() - SACCT_LIMIT).strftime("%Y-%m-%dT%H:%M:%S")
cmd = [
"sacct",
"-X",
"-S", start,
"--parsable2",
"-o", "jobid,jobname,state,elapsed,nodelist,reason,constraints,reqtres,stdout"
]
lines = run(cmd)
return parse_table(lines)
def classify_arch(job):
c = job.get("constraints", "").lower()
tres = job.get("tres", "").lower()
nodes = job.get("node", "").lower()
cpu_model = 'UNK'
# explicit constraint wins
if "zen4" in c:
cpu_model = "ZEN"
if "icelake" in c:
cpu_model = "ICE"
if "skylake" in c:
cpu_model = "SKY"
gpu = parse_gpu(tres)
# heuristic: GPUs are on specific CPU models in my cluster
lookup = {'h100': 'ZEN', 't4': 'SKY', 'v100': 'SKY', 'a40': 'ICE', 'a100': 'ICE'}
return lookup.get(gpu, cpu_model), gpu.upper()
def normalize_squeue(jobs):
out = []
for j in jobs:
entry = {
"source": "squeue",
"jobid": j["JOBID"],
"name": j["NAME"],
"state": j["STATE"].split()[0],
"time": dhms_to_hms(j["TIME"]),
"node": j["NODELIST(REASON)"],
"tres": j["TRES_PER_NODE"],
"constraints": j["FEATURES"],
"stdout": "",
}
entry["arch"], entry["gpu"] = classify_arch(entry)
out.append(entry)
return out
def normalize_sacct(jobs):
out = []
for j in jobs:
entry = {
"source": "sacct",
"jobid": j["JobID"],
"name": j["JobName"],
"state": j["State"].split()[0],
"time": dhms_to_hms(j["Elapsed"]),
"node": j["NodeList"],
"tres": j["ReqTRES"],
"constraints": j["Constraints"],
"stdout": j["StdOut"],
}
entry["arch"], entry["gpu"] = classify_arch(entry)
out.append(entry)
return out
def main():
jobs = normalize_squeue(get_squeue())
sa = normalize_sacct(get_sacct())
active_jobids = [j['jobid'] for j in jobs] # dont double up on active jobs
jobs += [s for s in sa if s['jobid'] not in active_jobids]
console = Console(force_terminal=True)
table = Table(show_header=True, header_style="bold", box=None)
table.add_column("JOBID", justify="right")
table.add_column("STATE")
table.add_column("ARCH")
table.add_column("GPU")
table.add_column("TIME", justify="right")
table.add_column("NODE")
table.add_column("NAME")
table.add_column("OUT")
for j in jobs:
state = j['state']
stdout = ''
if state.upper() in ("FAILED", "TIMEOUT"):
state = f"[red]{state}[/red]"
stdout = fix_path(j['stdout'], j['jobid'])
elif state.upper() == "COMPLETED":
state = f"[green]{state}[/green]"
elif state.upper() == "CANCELLED":
state = f"[yellow]{state}[/yellow]"
elif state.upper() == "PENDING":
state = f"[#888888]{state}[/#888888]"
arch = j['arch']
if arch == 'ZEN':
arch = f"[red]{arch}[/red]"
else:
arch = f"[blue]{arch}[/blue]"
table.add_row(j['jobid'], state, arch, j['gpu'], j['time'], j['node'], j['name'], stdout)
console.print(table)
if __name__ == "__main__":
main()
|
Example running with watch --color q_advanced: