Skip to content

Commit af2531e

Browse files
Remove HuggingFace mirror (#8)
* Add acknowledgments to model card template, strip HF frontmatter from README New templates/ directory for generation templates. Model card template now includes acknowledgments table with references to the projects and papers PAWN builds on. Remove HF YAML frontmatter from GitHub README. * Remove HF sync action (no longer mirroring repo to HuggingFace) * Add Jinja2 model card generator with live HF metrics templates/hf_model_card.md.j2: Jinja2 template replacing the old {PLACEHOLDER} template. Includes acknowledgments, probe results, diagnostics, and all architecture/training details. scripts/generate_model_cards.py: fetches metrics.jsonl and eval_results.json from each HF model repo, renders the template, and optionally uploads. No hardcoded metrics — everything is pulled from the source of truth. Usage: python scripts/generate_model_cards.py --push * Fetch model specs from HF config.json instead of hardcoding * Count params from safetensors weights instead of estimating * Load accuracy ceilings from data/theoretical_ceiling.json * Fail loudly if theoretical_ceiling.json is missing * Remove all hardcoded fallbacks — fail loudly on missing data * Replace all .get() fallbacks with strict key access * Remove last silent fallback (head_dim division guard) * Check in model cards, add GH Action to sync to HuggingFace cards/model/pawn-{small,base,large}.md: generated model cards checked into the repo as the source of truth. Changes go through PRs. cards/hf_model_card.md.j2: Jinja2 template (moved from templates/). .github/workflows/sync-model-cards.yml: on push to main, uploads cards/model/*.md to the corresponding HF model repos as README.md. scripts/generate_model_cards.py: updated output dir to cards/model/, warns loudly on missing optional fields (top5, legal_rate) from older training runs but does not silently default to zero. * Regenerate small model card after fixing metrics.jsonl on HF * Handle val records missing extended fields in model card generator fetch_best_metrics() now merges top5_accuracy, legal_move_rate, and perplexity from the best val record that has them, when the overall best-loss record doesn't. This handles the case where val was logged every 500 steps but backfilled extended metrics only exist at 5K-step checkpoint boundaries.
1 parent 410d056 commit af2531e

File tree

8 files changed

+1369
-44
lines changed

8 files changed

+1369
-44
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
name: Sync Model Cards to HuggingFace
2+
3+
on:
4+
push:
5+
branches: [main]
6+
paths:
7+
- 'cards/model/*.md'
8+
9+
jobs:
10+
sync:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Upload model cards to HuggingFace
16+
env:
17+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
18+
run: |
19+
pip install --quiet huggingface-hub
20+
python3 -c "
21+
from huggingface_hub import HfApi
22+
from pathlib import Path
23+
24+
api = HfApi()
25+
cards_dir = Path('cards/model')
26+
27+
variant_repos = {
28+
'pawn-small.md': 'thomas-schweich/pawn-small',
29+
'pawn-base.md': 'thomas-schweich/pawn-base',
30+
'pawn-large.md': 'thomas-schweich/pawn-large',
31+
}
32+
33+
for filename, repo in variant_repos.items():
34+
card_path = cards_dir / filename
35+
if card_path.exists():
36+
api.upload_file(
37+
path_or_fileobj=str(card_path),
38+
path_in_repo='README.md',
39+
repo_id=repo,
40+
repo_type='model',
41+
)
42+
print(f'Uploaded {filename} -> {repo}')
43+
else:
44+
print(f'Skipped {filename} (not found)')
45+
"

.github/workflows/sync-to-hf.yml

Lines changed: 0 additions & 18 deletions
This file was deleted.

README.md

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,3 @@
1-
---
2-
library_name: pawn
3-
license: apache-2.0
4-
tags:
5-
- chess
6-
- transformer
7-
- world-model
8-
- causal-lm
9-
- next-token-prediction
10-
- representation-learning
11-
- parameter-efficient-finetuning
12-
- pytorch
13-
- rust
14-
language:
15-
- en
16-
pipeline_tag: other
17-
citation: |
18-
@software{schweich2026pawn,
19-
author = {Schweich, Thomas},
20-
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
21-
year = 2026,
22-
url = {https://github.com/thomas-schweich/PAWN},
23-
license = {Apache-2.0}
24-
}
25-
---
26-
271
# PAWN: Playstyle-Agnostic World-model Network for Chess
282

293
A small causal transformer trained on random chess games that learns legal moves, board state representations, and game dynamics purely from random legal move sequences absent any form of strategic play.

cards/hf_model_card.md.j2

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
---
2+
library_name: pawn
3+
license: apache-2.0
4+
base_model:
5+
- thomas-schweich/pawn-small
6+
- thomas-schweich/pawn-base
7+
- thomas-schweich/pawn-large
8+
tags:
9+
- chess
10+
- transformer
11+
- world-model
12+
- causal-lm
13+
- next-token-prediction
14+
- representation-learning
15+
- pytorch
16+
- rust
17+
model_name: PAWN-{{ variant_name }}
18+
pipeline_tag: other
19+
citation: |
20+
{% raw %}@software{schweich2026pawn,
21+
author = {Schweich, Thomas},
22+
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
23+
year = {2026},
24+
url = {https://github.com/thomas-schweich/PAWN},
25+
license = {Apache-2.0}
26+
}{% endraw %}
27+
model_params: {{ params_num }}
28+
d_model: {{ d_model }}
29+
n_layers: {{ n_layers }}
30+
n_heads: {{ n_heads }}
31+
d_ff: {{ d_ff }}
32+
context_length: 256
33+
vocab_size: 4284
34+
datasets:
35+
- random-chess-games
36+
language:
37+
- en
38+
metrics:
39+
- accuracy
40+
model-index:
41+
- name: PAWN-{{ variant_name }}
42+
results:
43+
- task:
44+
type: next-token-prediction
45+
name: Chess Move Prediction (Random Games)
46+
metrics:
47+
{% if legal_rate is not none %}
48+
- name: Legal Move Rate
49+
type: accuracy
50+
value: {{ "%.4f"|format(legal_rate / 100) }}
51+
{% endif %}
52+
- name: Top-1 Accuracy
53+
type: accuracy
54+
value: {{ "%.4f"|format(top1 / 100) }}
55+
{% if top5 is not none %}
56+
- name: Top-5 Accuracy
57+
type: accuracy
58+
value: {{ "%.4f"|format(top5 / 100) }}
59+
{% endif %}
60+
- name: Val Loss
61+
type: loss
62+
value: {{ "%.4f"|format(val_loss) }}
63+
- name: Games Seen
64+
type: other
65+
value: 25600000
66+
---
67+
68+
# PAWN-{{ variant_name }}
69+
70+
**PAWN** (Playstyle-Agnostic World-model Network for Chess) is a causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from uniformly random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.
71+
72+
This is the **{{ variant_label }}** variant ({{ params }} parameters). PAWN is designed as a frozen backbone for parameter-efficient finetuning into player models with arbitrary playstyles.
73+
74+
**[GitHub Repository](https://github.com/thomas-schweich/PAWN)** -- full source code, training scripts, adapter implementations, and documentation.
75+
76+
## All Variants
77+
78+
| Variant | Parameters | Link |
79+
|---------|------------|------|
80+
| PAWN-Small | ~9.5M | [thomas-schweich/pawn-small](https://huggingface.co/thomas-schweich/pawn-small) |
81+
| PAWN (Base) | ~35.8M | [thomas-schweich/pawn-base](https://huggingface.co/thomas-schweich/pawn-base) |
82+
| PAWN-Large | ~68.4M | [thomas-schweich/pawn-large](https://huggingface.co/thomas-schweich/pawn-large) |
83+
84+
## Headline Metrics
85+
86+
| Metric | Value |
87+
|--------|-------|
88+
{% if legal_rate is not none %}| Legal move rate | {{ "%.2f"|format(legal_rate) }}% |
89+
{% endif %}| Top-1 accuracy | {{ "%.2f"|format(top1) }}% |
90+
{% if top5 is not none %}| Top-5 accuracy | {{ "%.2f"|format(top5) }}% |
91+
{% endif %}| Val loss | {{ "%.3f"|format(val_loss) }} |
92+
93+
### Accuracy Ratios
94+
95+
PAWN is trained on uniformly random chess games, so top-1 accuracy has a hard theoretical ceiling. Ratios above 100% on the unconditioned ceiling indicate the model has learned structure beyond simply identifying legal moves. See [Accuracy Ceiling Analysis](https://github.com/thomas-schweich/PAWN/blob/main/docs/ACCURACY_CEILING.md).
96+
97+
| Ceiling | Ratio |
98+
|---------|-------|
99+
| Unconditioned (E\[1/N_legal\] = {{ "%.2f"|format(uncond_ceiling) }}%) | {{ uncond_ratio }}% |
100+
| Naive-conditioned (1-ply filter = {{ "%.2f"|format(naive_ceiling) }}%) | {{ naive_ratio }}% |
101+
| Bayes-optimal conditioned (MCTS, 32 rollouts = {{ "%.2f"|format(mcts_ceiling) }}%) | {{ mcts_ratio }}% |
102+
{% if probes %}
103+
104+
## Probe Results
105+
106+
Linear probes trained on frozen hidden states measure how well the model's internal representations encode board-level features.
107+
108+
| Probe | Accuracy | Description |
109+
|-------|----------|-------------|
110+
{% for probe in probes -%}
111+
| {{ probe.name }} | {{ probe.result }} | {{ probe.description }} |
112+
{% endfor %}
113+
{% endif %}
114+
{% if diagnostics %}
115+
116+
## Diagnostic Results
117+
118+
Edge-case diagnostics measure the model's legal move rate in specific tactical situations.
119+
120+
| Category | Positions | Legal Rate |
121+
|----------|-----------|------------|
122+
{% for diag in diagnostics -%}
123+
| {{ diag.name }} | {{ diag.n }} | {{ diag.value }} |
124+
{% endfor %}
125+
{% endif %}
126+
127+
## Architecture
128+
129+
| Parameter | Value |
130+
|-----------|-------|
131+
| Architecture | Decoder-only transformer |
132+
| d_model | {{ d_model }} |
133+
| Layers | {{ n_layers }} |
134+
| Attention heads | {{ n_heads }} |
135+
| Head dimension | {{ head_dim }} |
136+
| d_ff | {{ d_ff }} |
137+
| Parameters | {{ params }} |
138+
| Vocabulary | 4,284 tokens |
139+
| Context length | 256 tokens |
140+
| Normalization | Pre-norm RMSNorm |
141+
| FFN | SwiGLU (4x expansion) |
142+
| Positional encoding | Rotary (RoPE, base 10000) |
143+
| Embeddings | Factored (src + dst + promo) |
144+
| Dropout | 0.0 |
145+
146+
## Training Details
147+
148+
| Parameter | Value |
149+
|-----------|-------|
150+
| Training data | On-the-fly uniformly random legal games (no external dataset) |
151+
| Objective | Next-token cross-entropy (non-padding positions only) |
152+
| Total steps | 100,000 |
153+
| Batch size | 256 |
154+
| Games seen | 25,600,000 |
155+
| Learning rate | 3e-4 (cosine decay with 1,000-step warmup) |
156+
| Optimizer | AdamW (weight decay 0.01) |
157+
| Precision | Mixed (AMP) |
158+
| Hardware | NVIDIA H200 |
159+
160+
## Usage
161+
162+
### Loading the model
163+
164+
```python
165+
import torch
166+
from safetensors.torch import load_file
167+
from pawn.config import CLMConfig
168+
from pawn.model import PAWNCLM
169+
170+
cfg = CLMConfig.{{ variant_factory }}()
171+
model = PAWNCLM(cfg).cuda().eval()
172+
weights = load_file("model.safetensors", device="cuda")
173+
model.load_state_dict(weights)
174+
```
175+
176+
Or load directly from HuggingFace:
177+
178+
```python
179+
from pawn.checkpoint import load_backbone_weights
180+
from pawn.config import CLMConfig
181+
from pawn.model import PAWNCLM
182+
183+
weights, config = load_backbone_weights("thomas-schweich/pawn-{{ variant_key }}")
184+
cfg = CLMConfig.{{ variant_factory }}()
185+
model = PAWNCLM(cfg).eval()
186+
model.load_state_dict(weights)
187+
```
188+
189+
### Finetuning with an adapter
190+
191+
```bash
192+
uv run python scripts/train_bottleneck.py \
193+
--checkpoint thomas-schweich/pawn-{{ variant_key }} \
194+
--pgn thomas-schweich/pawn-lichess-full \
195+
--bottleneck-dim 32 --lr 1e-4 --local-checkpoints
196+
```
197+
198+
## Acknowledgments
199+
200+
PAWN builds on ideas and tools from the following projects and publications:
201+
202+
| Component | Reference |
203+
|-----------|-----------|
204+
| Transformer | [Vaswani et al., "Attention Is All You Need", NeurIPS 2017](https://arxiv.org/abs/1706.03762) |
205+
| RMSNorm | [Zhang & Sennrich, "Root Mean Square Layer Normalization", NeurIPS 2019](https://arxiv.org/abs/1910.07467) |
206+
| RoPE | [Su et al., "RoFormer: Enhanced Transformer with Rotary Position Embedding", 2021](https://arxiv.org/abs/2104.09864) |
207+
| SwiGLU | [Shazeer, "GLU Variants Improve Transformer", 2020](https://arxiv.org/abs/2002.05202) |
208+
| AdamW | [Loshchilov & Hutter, "Decoupled Weight Decay Regularization", ICLR 2019](https://arxiv.org/abs/1711.05101) |
209+
| Cosine schedule | [Loshchilov & Hutter, "SGDR: Stochastic Gradient Descent with Warm Restarts", ICLR 2017](https://arxiv.org/abs/1608.03983) |
210+
| Mixed precision | [Micikevicius et al., "Mixed Precision Training", ICLR 2018](https://arxiv.org/abs/1710.03740) |
211+
| Bottleneck adapters | [Houlsby et al., "Parameter-Efficient Transfer Learning for NLP", ICML 2019](https://arxiv.org/abs/1902.00751) |
212+
| LoRA | [Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models", ICLR 2022](https://arxiv.org/abs/2106.09685) |
213+
| FiLM | [Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer", AAAI 2018](https://arxiv.org/abs/1709.07871) |
214+
| RoSA | [Nikdan et al., "RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation", 2024](https://arxiv.org/abs/2401.04679) |
215+
| Linear probes | [Alain & Bengio, "Understanding Intermediate Layers Using Linear Classifier Probes", ICLR Workshop 2017](https://arxiv.org/abs/1610.01644) |
216+
| Intrinsic dimensionality | [Aghajanyan et al., "Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning", ACL 2021](https://arxiv.org/abs/2012.13255) |
217+
| MAIA | [McIlroy-Young et al., "Aligning Superhuman AI with Human Behavior: Chess as a Model System", KDD 2020](https://arxiv.org/abs/2006.01855) |
218+
| AlphaZero | [Silver et al., "A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play", Science 2018](https://arxiv.org/abs/1712.01815) |
219+
| Leela Chess Zero | [github.com/LeelaChessZero/lc0](https://github.com/LeelaChessZero/lc0) |
220+
| shakmaty | [github.com/niklasf/shakmaty](https://github.com/niklasf/shakmaty) |
221+
| PyO3 | [github.com/PyO3/pyo3](https://github.com/PyO3/pyo3) |
222+
| Lichess | [lichess.org](https://lichess.org/) / [database.lichess.org](https://database.lichess.org/) |
223+
224+
## Citation
225+
226+
{% raw %}
227+
```bibtex
228+
@software{schweich2026pawn,
229+
author = {Schweich, Thomas},
230+
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
231+
year = {2026},
232+
url = {https://github.com/thomas-schweich/PAWN},
233+
license = {Apache-2.0}
234+
}
235+
```
236+
{% endraw %}
237+
238+
## License
239+
240+
Apache 2.0. See [LICENSE](https://github.com/thomas-schweich/PAWN/blob/main/LICENSE).

0 commit comments

Comments
 (0)