Sokoplan is Sokoban research/engineering project focused on A* planning with classical heuristics and learned GNN-based heuristics, including pipelines for dataset preparation, label generation, training, evaluation, and transferability-driven level grouping.
- Core Sokoban engine (
sokoban_core/): state representation, parsing, moves, deadlocks, level I/O. - Search stack (
search/,heuristics/): A* with transposition table, classical heuristics, Festival heuristic, learned GNN heuristic. - GNN stack (
gnn/): graph construction from states, PyG model, dataset and training loop. - Experiment scripts (
scripts/): data fetch, split creation, label generation, training, solving/evaluation, clustering/grouping. - Hydra SLURM scripts (
scripts_hydra/): cluster-ready pipelines for labels, training, eval, and block workflow.
- A level is solved by A* over push-based Sokoban states.
- Supervised labels are generated from solved trajectories:
each training sample is a state with target
y = pushes_to_goal. - A GIN/GINE model predicts
yand is used as heuristic:speed: use GNN output directly.optimal_mix:min(gnn, hungarian)to keep conservative behavior.
- Levels can be grouped and trained/evaluated per group to study transferability.
sokoplan/
├── sokoban_core/ # parser/state/moves/deadlocks/levels
├── search/ # A* + priority queue + transposition table
├── heuristics/ # zero/hungarian/festival/gnn heuristics
├── gnn/ # graph features, model, dataset, train loop
├── scripts/ # local pipelines and utilities
│ ├── fetcher/ # download datasets (letslogic/boxoban)
│ ├── labels/ # generate supervised/off-policy labels
│ ├── train/ # model training entrypoint
│ ├── solve/ # batch eval for classic and GNN heuristics
│ └── blocks/ # split/merge/group pipelines
├── scripts_hydra/ # SLURM jobs for Hydra cluster
│ ├── generate_labels/
│ ├── train_model/
│ ├── eval/
│ └── blocks/
├── configs/data.yaml # level roots, sources, filters, split paths
├── festival/ # Festival C++ solver (binary used by scripts)
└── tests/ # pytest suite
- Python 3.11 (see
pyproject.toml) - Linux/macOS (Hydra scripts target SLURM)
- Optional but recommended: CUDA-capable GPU for training/eval speed
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtMany scripts expect binary at festival/festival.
make -C festivalAcross scripts, level references use:
path/to/pack.txt#idx
Where idx is the 0-based level index inside a multi-level pack file.
Each line stores one state sample, typically:
- board fields:
width,height,walls,goals,boxes,player,board_mask - target:
y(remaining pushes/cost-to-go) - provenance:
level_id
Run commands from repo root and prefer env variables over fixed paths.
source .venv/bin/activate
# Windows packs from sourcecode.se -> sokoban_core/levels/windows
python -m scripts.fetcher.fetch_windows
# Boxoban packs -> sokoban_core/levels/boxoban
python -m scripts.fetcher.fetch_boxoban
# LetsLogic packs (requires API key) -> sokoban_core/levels/letslogic/raw
python -m scripts.fetcher.fetch_letslogic "$LETSLOGIC_API_KEY"CONFIG_PATH="configs/data.yaml"
SEED=42
TRAIN_N=2000
VAL_N=200
TEST_N=200
python -m scripts.utils.make_splits \
--config "$CONFIG_PATH" \
--seed "$SEED" \
--train "$TRAIN_N" \
--val "$VAL_N" \
--test "$TEST_N"LIST_PATH="sokoban_core/levels/splits/train.txt"
LABELS_OUT="data/generated_labels/labels.jsonl"
python -m scripts.labels.generate_labels \
--list "$LIST_PATH" \
--out "$LABELS_OUT" \
--h hungarian \
--use_dl \
--time_limit 300 \
--node_limit 3000000 \
--jobs 8LIST_PATH="sokoban_core/levels/splits/train.txt"
LABELS_OUT="data/generated_labels/labels_festival.jsonl"
python -m scripts.labels.generate_labels_festival \
--list "$LIST_PATH" \
--out "$LABELS_OUT" \
--time_limit 300 \
--jobs 8 \
--sample_every 2TRAIN_JSONL="data/generated_labels/labels.jsonl"
VAL_JSONL="data/generated_labels/labels_val.jsonl" # optional
MODEL_OUT="artifacts/gnn_best.pt"
CKPT_OUT="artifacts/gnn_checkpoint.pt"
python -m scripts.train.train_gnn \
--train "$TRAIN_JSONL" \
--val "$VAL_JSONL" \
--epochs 30 \
--batch 512 \
--hidden 128 \
--layers 4 \
--lr 1e-3 \
--dropout 0.05 \
--workers 4 \
--prefetch_factor 4 \
--pin_memory \
--persistent_workers \
--amp \
--out "$MODEL_OUT" \
--checkpoint "$CKPT_OUT" \
--resumeLIST_PATH="sokoban_core/levels/splits/test.txt"
OUT_CSV="results/evaluate/hungarian.csv"
python -m scripts.solve.run_batch \
--list "$LIST_PATH" \
--h hungarian \
--use_dl \
--out "$OUT_CSV" \
--time_limit 600 \
--node_limit 2000000 \
--jobs 8LIST_PATH="sokoban_core/levels/splits/test.txt"
CKPT_PATH="artifacts/gnn_best.pt"
OUT_CSV="results/evaluate/gnn.csv"
python -m scripts.solve.run_batch_gnn \
--list "$LIST_PATH" \
--ckpt "$CKPT_PATH" \
--mode speed \
--out "$OUT_CSV" \
--time_limit 600 \
--node_limit 2000000 \
--jobs 8EVAL_LIST="sokoban_core/levels/splits/test.txt"
MODELS_DIR="artifacts"
EVAL_DIR="results/evaluate/models"
HUN_CSV="results/evaluate/hungarian.csv"
python -m scripts.solve.eval_gnn_models run \
--list "$EVAL_LIST" \
--models_dir "$MODELS_DIR" \
--out_dir "$EVAL_DIR" \
--ensure_hungarian \
--hungarian_csv "$HUN_CSV"
python -m scripts.solve.eval_gnn_models summarize \
--eval_dir "$EVAL_DIR" \
--hungarian_csv "$HUN_CSV" \
--out "$EVAL_DIR/models_summary.csv"Detailed Hydra instructions are maintained in:
scripts_hydra/generate_labels/README.mdscripts_hydra/train_model/README.mdscripts_hydra/blocks/README.md
Recommended run pattern:
VAR1=value1 VAR2=value2 sbatch path/to/script.sbatchFor array jobs, keep NUM_SHARDS aligned with --array.
The repository supports multiple level-grouping directions:
- Transferability-driven split/merge pipeline (self-eval + cross-eval driven grouping)
- Static-feature clustering pipeline
(
scripts.blocks.static.*for feature extraction and KMeans grouping)
Use the blocks Hydra README as the primary orchestration guide:
scripts_hydra/blocks/README.md.
This pipeline groups levels using only static/topological map features
(scripts.blocks.static.*), without transferability cross-eval.
IN_LISTS_DIR="sokoban_core/levels/letslogic/filtered/lists"
STATIC_LISTS_DIR="sokoban_core/levels/letslogic/clusters_static_features/lists"
STATIC_REPORT="results/static_groups/report.json"
python -m scripts.blocks.static.plan_groups_by_static_features \
--in_lists_dir "$IN_LISTS_DIR" \
--out_lists_dir "$STATIC_LISTS_DIR" \
--report_json "$STATIC_REPORT" \
--k 0 \
--k_min 12 \
--k_max 64 \
--auto_k_sample_size 8000 \
--min_cluster_size 120 \
--seed 42Notes:
--k 0enables auto-selection ofKvia silhouette score.--min_cluster_sizemerges tiny clusters into nearest larger ones.
python -m scripts.blocks.static.summarize_static_groups \
--report_json "$STATIC_REPORT" \
--stat median \
--top_features 10 \
--out_csv "results/static_groups/summary.csv"STATIC_PACKS_DIR="sokoban_core/levels/letslogic/clusters_static_features"
python -m scripts.blocks.utils.list_to_pack \
--lists_dir "$STATIC_LISTS_DIR" \
--out_dir "$STATIC_PACKS_DIR" \
--name_prefix_from_list_stemPACK_LABELS_DIR="data/packs_labels_festival"
GROUP_LABELS_DIR="data/group_labels_festival_static"
python -m scripts.blocks.utils.repack_labels_to_groups \
--labels_dir "$PACK_LABELS_DIR" \
--groups_lists_dir "$STATIC_LISTS_DIR" \
--out_dir "$GROUP_LABELS_DIR"This creates group_XXX.jsonl labels that can be used to train one model per
static group.
If you already have:
- test lists per static group (e.g.
.../splits/letslogic/static_features_clusters/test/group_XXX.list) - model checkpoints per static group (
group_XXX.pt)
you can run:
TEST_LISTS_DIR="sokoban_core/levels/splits/letslogic/static_features_clusters/test"
MODELS_DIR="artifacts/letslogic/static_features"
OUT_DIR="results/evaluate/static_crosspack/raw"
TEST_LISTS_DIR="$TEST_LISTS_DIR" MODELS_DIR="$MODELS_DIR" OUT_DIR="$OUT_DIR" \
sbatch --array=0-11 scripts_hydra/eval/run_eval_static_crosspack_array.sbatch
python -m scripts.utils.build_static_crosspack_matrix \
--raw_dir "$OUT_DIR" \
--out_dir "results/evaluate/static_crosspack/matrix"Adjust --array=0-N to match your number of static groups.
artifacts/,results/, and largedata/subfolders contain experiment outputs and may vary between runs/machines.- Most scripts expose CLI args for paths and limits; prefer passing parameters via env/flags instead of editing scripts.