A Python CLI for launching, tracking, and aggregating ML experiments across a local GPU cluster. YAML-configured sweeps, SQLite-backed metric and artifact tracking, automatic GPU pinning, and optional Docker containerization for reproducible environments.
Cut a typical experiment setup (spin up, pin GPUs, seed, start TensorBoard, copy configs, track results) from 20+ minutes of shell wrangling to ~30 seconds.
- YAML configs — single-run or full grid sweeps
- Local GPU scheduler with automatic GPU pinning via
CUDA_VISIBLE_DEVICES - SQLite tracking DB for runs, metrics, and artifacts (no server required)
- Docker integration for reproducible environments (optional)
- Sweep aggregation — summarize metrics across runs with
xrun summarize - Failure isolation — a crashing run doesn't kill the sweep
# Install
pip install -r requirements.txt
pip install -e . # makes `xrun` available; or use `python -m src.cli`
# Detect GPUs
xrun gpus
# => Detected GPUs: [0, 1, 2, 3]
# Launch an experiment or sweep
xrun run examples/sample_experiment/experiment.yaml
# List recent runs
xrun list --name mnist_mlp --limit 10
# Inspect a single run
xrun show mnist_mlp_abc12345
# Aggregate across all runs with a name prefix
xrun summarize --name mnist_mlp# examples/sample_experiment/experiment.yaml
base:
name: mnist_mlp
command: "python train.py --cfg {cfg} --out {artifacts_dir}"
resources:
gpus: 1
artifacts_dir: "runs/{name}/{run_id}"
params:
epochs: 10
batch_size: 64
sweep:
lr: [1.0e-3, 5.0e-3, 1.0e-2]
hidden: [64, 128, 256]Launch the 9-run grid (3 × 3):
xrun run examples/sample_experiment/experiment.yaml --max-concurrent 2The scheduler runs 2 experiments at a time, pinning each to its own GPU.
Add a docker: block to the config to run each experiment inside a container:
base:
name: resnet_ablation
command: "python train.py --cfg {cfg}"
docker:
image: "pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime"
gpus: 1
mounts:
- "/data/imagenet:/data/imagenet:ro"
env:
PYTHONUNBUFFERED: "1"
resources:
gpus: 1Every run now starts a fresh container, mounts the workdir at /workspace, and
passes GPU access via --gpus device=N.
The SQLite database (default experiments.db) stores:
runs -- run_id, name, config_json, status, started_at, ended_at, exit_code, gpu_ids, ...
metrics -- run_id, key, value, step, wall_time
artifacts -- run_id, name, path, size_bytes, registered_atThe experimenter's train.py can log metrics from inside the container:
from src.tracking import ExperimentStore
store = ExperimentStore("/workspace/experiments.db")
store.log_metric(run_id, "val_auc", 0.812, step=epoch)
store.register_artifact(run_id, "checkpoint", "/workspace/runs/ckpt.pt")src/
cli.py Click-based CLI (`xrun`)
config.py YAML config + sweep expansion (pydantic)
scheduler.py Local GPU pool + job scheduler (threaded)
docker_env.py `docker run` command builder
tracking.py SQLite experiment store
tests/
test_config.py Sweep expansion tests
test_tracking.py Metrics + summarize tests
examples/sample_experiment/
experiment.yaml 3-way × 2-way grid demo
train.py Minimal training script (no GPU required)
- Why not MLflow / W&B? Those are great but require a server, account, or
network.
xrunis a single-file SQLite backend that runs purely locally. - Why not Ray Tune / Optuna? Different purpose — Tune is for optimizing
hyperparameters with adaptive schedulers.
xrunis for running a manual grid or ad-hoc experiment and tracking the results. - Why not SLURM? SLURM is great for large shared clusters.
xruntargets the local-GPU-box or small team case where SLURM is overkill.
- Single-host only. Scale-out would require an RPC layer over
scheduler.py. - Scheduler uses Python threads (fine for launching subprocesses), not async.
experiments.dbis SQLite — fine for thousands of runs, not millions.- GPU detection relies on
nvidia-smi; CPU-only machines fall back to runningmax_concurrentjobs in parallel without GPU pinning.