Vision is a small in-progress model designed to predict the next move in a chess game using a Transformer model. Most importantly the end goal is to train a bot that plays convincingly like a human.
It's a hobby project and thus far this method is unproven. For more proven neural network architectures check out pytorch-nnue.
Before you proceed make sure you have poetry installed.
You might also want to get some PGN files, lichess provides monthly snapshots.
These snapshots are quite large (~200GB per month uncompressed) so you may want a tool like pgn-extract. It's very fast and very helpful for filtering these files. Here's a blog showing some usage examples for pgn-extract.
Setup is simple:
poetry install-
You'll need to get some pgn files and place them in a folder (ex.
data/pgn). -
Then process these into numpy arrays using
pgn_to_npy.py. This example usespoetrybut use whichever package/environment manager you like.$(poetry env activate) python vision/pgn_to_npy.py --input ./data/pgn --output-dir ./data(Note: The search for PGN files isn't recursive, it will only look at the top level of the directory.)
-
Then run the model, you can also adjust most hyperparameters in
config.yamlor use your own custom one and specify it with the--model CONFIGflag. Runpython vision/main.py --helpto see all the subcommands.python vision/main.py fit
-
The model will run and automatically save any epochs that perform better than the last best epoch. You should see pytorch-lightning output like:
| Name | Type | Params | Mode ------------------------------------------------------------ 0 | token_embedding | Embedding | 2.1 M | train 1 | positional_embedding | Embedding | 25.6 K | train 2 | transformer_blocks | ModuleList | 4.2 M | train 3 | final_norm | RMSNorm | 512 | train 4 | out_head | Linear | 2.1 M | train ------------------------------------------------------------ 8.4 M Trainable params 0 Non-trainable params 8.4 M Total params 33.711 Total estimated model params size (MB) 21 Modules in train mode 0 Modules in eval mode Sanity Checking: | | 0/? [00:00<?, ?it/s] Epoch 0: 100%|████████████| 11391/11391 [1:21:14<00:00, 2.34it/s, v_num=30, train_loss_step=0.707, train_perplexity_step=2.030, val_accuracy_step=0.983]
LLMs are mostly produced with autoregressive transformer models but transformers are not LLMs. Transformer models simply take a current state of tokens and produce the next most likely token in the series. It's at least possible that these models would apply to predicting other ordered sequences when those sequences have relationships between each other.
This is mostly a hobby experiment for me. I want to learn more about machine learning and this seemed like a fun thing to try. To the best of my knowledge LLMs with chess haven't been explored yet beyond trying to get text generating LLMs like ChatGPT or Gemini to play it; so I thought it would be fun to explore the possibility of more novel LLM applications.
Further improvements and bugfixes to the model suggest that with enough training data, a large enough model, and a robust tokenization scheme (still experimenting) it may learn to generalize well.
The current tokenization scheme is based on UCI move instructions from the start of a standard chess game. As such it likely won't be any good at playing Chess variants like Chess960. The strict dependency between current and past moves may also make dropout ineffective (I haven't tested this yet). The tokenization scheme may change in the future as the model evolves.


