Lightweight, end-to-end PyTorch pipeline — data generation → training → search — all in torch, optimized for massive GPU parallelism. This repository implements a novel ML approach for finding short paths on large Cayley graphs (e.g. across permutation puzzles such as the Rubik’s Cube) by training neural networks to estimate diffusion distances and using those predictions to guide an efficient beam search. This approach is zero human knowledge (no handcrafted heuristics or domain rules).
The same PyTorch pipeline works across distinct Cayley graphs without changing model architecture and hyperparameters. With one NVIDIA H100, model parameters (N₁ = 1024, N₂ = 256, Nᵣ = 1), beam width 2²⁰ and only 1.28×10⁸ training examples we achieve these results:
| Puzzle | Graph size | Training (min) | Solving (min) | Avg. solution len. | Solved (%) |
|---|---|---|---|---|---|
| Rubik's Cube 3×3×3 | 4×10¹⁹ | 0.8 | 0.17 | 20.5 ± 0.1 | 100 |
| Rubik's Cube 4×4×4 | 7×10⁴⁵ | 1.5 | 1.24 | 65.0 ± 1.0 | 94 |
| Pancake Graph 55 | 3×10⁷³ | 1.4 | 3.49 | 50.0 ± 0.3 | 100 |
| Klotski 6×6 (PBC) | 2×10⁴¹ | 1.2 | 0.45 | 113 ± 2.0 | 100 |
Using more computations the method reaches 98% optimality on 3×3×3 (QTM) and outperforms the top Santa Kaggle 2023 submissions on 3×3×3, 4×4×4 and 5×5×5 in average solution length. On 3×3×3, the batched GPU beam search delivers 20× faster path finding than the previous state of the art at comparable solution lengths.
Run these commands to solve puzzles using preloaded model weights:
Cube 3x3x3 (UQTM metric)
python test.py --group_id 1 --target_id 0 --tests_num 3 --dataset santa --num_steps 100 --verbose 1 --epoch 8192 --model_id 333 --B 262144 --device_id 0Cube 4x4x4 (UQTM metric)
python test.py --group_id 2 --target_id 0 --tests_num 3 --dataset santa --num_steps 150 --verbose 1 --epoch 8192 --model_id 444 --B 262144 --device_id 0Cube 5x5x5 (UQTM metric)
python test.py --group_id 3 --target_id 0 --tests_num 3 --dataset santa --num_steps 200 --verbose 1 --epoch 8192 --model_id 555 --B 524288 --device_id 0Cube 3x3x3 (QTM metric)
python test.py --group_id 54 --target_id 0 --tests_num 3 --dataset deepcubea --num_steps 100 --verbose 1 --epoch 8192 --model_id 333 --B 262144 --device_id 0tests_num— sets an upper limit on the number of scrambles to test from the beginning of the dataset.dataset— selects the dataset to use:santa— official Kaggle Santa 2023 dataset.rnd— randomly generated dataset (100 scrambles) using 10,000(+1) random steps from the solved state.deepcubea(1000 DeepCubeA scrambles),deepcubeadifficult(69 DeepCubeA subset scrambles),deepcubeahard(16 DeepCubeA subset scrambles) — available only for group054(3x3x3, QTM metric), useful for benchmarking against DeepCubeA and EfficientCube.
device_id— specifies which GPU to use for testing (default is0). Useful when multiple GPUs are available.
Optimal solution lengths for the scrambles in the deepcubeahard: [20, 20, 20, 21, 20, 20, 20, 20, 19, 20, 20, 19, 21, 20, 20, 20].
Test results saved in logs/test_pXXX-tXXX-{dataset}_{model_id}_{epoch}_{B}.json. Each file is a JSON array where moves contain generator indices used to reach the solved state for corresponding scramble.
Run training using your puzzle configuration:
python train.py --group_id <id> --target_id 0 --epochs <epochs> --hd1 <N_1> --hd2 <N_2> --nrd <N_r> --batch_size 10000 --K_max <K_max> --device_id 0Example for 24-puzzle:
python train.py --group_id 28 --target_id 0 --epochs 16 --hd1 1024 --hd2 512 --nrd 1 --batch_size 10000 --K_max 100 --device_id 0After training, use the assigned model_id (generated as int(time.time()) during training) to run beam search evaluation to use trained model:
python test.py --group_id 28 --target_id 0 --tests_num 3 --dataset rnd --num_steps 300 --num_attempts 1 --verbose 1 --epoch 16 --model_id {MODEL_ID} --B 65536 --device_id 0Replace {MODEL_ID} with the actual numeric identifier saved in the logs.
To add your puzzle:
- Define your puzzle moves in
generatorsfolder aspXXX.json. - Place a torch tensor
.pt(1D tensor) intargets. - Add scrambles dataset (2D tensor, each row as a scramble) to
datasetsfolder.
For Cube 3x3x3 (QTM) on the deepcubea dataset, a multi-agent script traintest-multiagent.sh automates:
- training multiple agents (
group_id=054,target_id=0) - testing each agent on the same scrambles
- aggregating results: per-agent and ensemble statistics
After training and testing A agents, the script calls read-test-logs-multiagent.py to compute:
- average solution length per agent
- ensemble stats (shortest solution per scramble)
- solved percentage
- move sequences from the best agent per scramble
=== per agent ===
tests solved_% avg_len
123456789 1000 97.3 21.4
123456790 1000 98.2 20.8
=== ensemble (shortest per scramble) ===
solved % : 99.1
avg solution length: 19.95
=== moves (winning agent) ===
test_num solution_length model_id moves
0 20 123456790 [2, 0, 4, 5, 1, ...]
...
All logs are saved in logs/, with results printed at the end.
| Group ID | Puzzle | Kmax used |
|---|---|---|
| 000 | Cube 2x2x2 | 15 |
| 001 | Cube 3x3x3 | 26 |
| 002 | Cube 4x4x4 | 45 |
| 003 | Cube 5x5x5 | 65 |
| 004 | Cube 6x6x6 | 150 |
| 011 | Wreath 6/6 | 10 |
| 012 | Wreath 7/7 | 10 |
| 013 | Wreath 12/12 | 20 |
| 014 | Wreath 21/21 | 35 |
| 015 | Wreath 33/33 | 75 |
| 017 | Globe 1/8 | 60 |
| 018 | Globe 1/16 | 110 |
| 019 | Globe 2/6 | 25 |
| 020 | Globe 3/4 | 40 |
| 021 | Globe 6/4 | 40 |
| 022 | Globe 6/8 | 165 |
| 023 | Globe 6/10 | 170 |
| 024 | Globe 3/33 | 500 |
| 025 | Globe 8/25 | 700 |
| 026 | Puzzle 8 | 30 |
| 027 | Puzzle 15 | 80 |
| 028 | Puzzle 24 | 150 |
| 029 | Puzzle 35 | 200 |
| 030 | Puzzle 48 | 250 |
| 031 | Puzzle 63 | 300 |
| 034 | LRX 10 | 50 |
| 035 | LRX 15 | 100 |
| 036 | LRX 20 | 200 |
| 037 | LRX 25 | 300 |
| 044 | Pancake 10 | 15 |
| 045 | Pancake 15 | 25 |
| 046 | Pancake 20 | 30 |
| 047 | Pancake 25 | 40 |
| 048 | Pancake 30 | 45 |
| 049 | Pancake 35 | 55 |
| 050 | Pancake 40 | 60 |
| 051 | Pancake 45 | 65 |
| 052 | Pancake 50 | 70 |
| 053 | Pancake 55 | 75 |
| 054 | Cube 3x3x3 (DeepCubeA metric) | 26 |
For reproducing results from Table 4, refer to provided traintest-tab4-santa.sh and traintest-tab4-rnd.sh, paper/solver-scrambles and paper/figure-scrambles contain generators and scrambles used in each specific experiment. The classical 15-puzzle (no wrap-around) is not a Cayley graph: node degrees vary with the blank position, hence the state graph is not vertex-transitive. We include it as a non-Cayley baseline at separate branch puzzle-15.
If you use this work in research, please cite one of the following BibTeX entries.
% Proceedings version (preferred once the NeurIPS paper is out)
@inproceedings{Chervov2025NeurIPSRubiks,
author = {Alexander Chervov and Kirill Khoruzhii and Nikita Bukhal and Jalal Naghiyev and Vladislav Zamkovoy and Ivan Koltsov and Lyudmila Cheldieva and Arsenii Sychev and Arsenii Lenin and Mark Obozov and Egor Urvanov and Alexey M. Romanov},
title = {A machine learning approach that beats Rubik’s cubes},
booktitle = {Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)},
year = {2025},
note = {Spotlight},
url = {https://neurips.cc/}
}% arXiv preprint (use this if you are citing the preprint)
@misc{Chervov2025arXivRubiks,
author = {Alexander Chervov and Kirill Khoruzhii and Nikita Bukhal and Jalal Naghiyev and Vladislav Zamkovoy and Ivan Koltsov and Lyudmila Cheldieva and Arsenii Sychev and Arsenii Lenin and Mark Obozov and Egor Urvanov and Alexey M. Romanov},
title = {A Machine Learning Approach That Beats Large Rubik's Cubes},
year = {2025},
eprint = {2502.13266},
archivePrefix = {arXiv},
url = {https://arxiv.org/abs/2502.13266}
}