Commit 681ab32
committed
Dashboard style overhaul + expanded game-completion eval and stopping criteria
Dashboard:
- New `pawn/dashboard/theme.py` centralizes color palette, layout, and
styling so charts and the Solara shell share one coherent system.
- charts.py refactored to pull from the theme; titles bolded; log-scale
error_rate_chart added with optional log-linear fit overlays (desaturated
dashed lines, half-life shown in legend).
- sol.py reorganized into sections, with the Game Integrity section
pairing the error rate chart against the patience chart.
- val_accuracy_chart trimmed to Top-1/Top-5 (legal/late-legal moved to
the error rate chart so accuracy scale is readable).
Game completion eval:
- Fully vectorized via `_game_completion_chunk` + `_aggregate_game_completion`:
no Python per-game loop, processes the full val set in batch_size chunks,
peak memory independent of val_games.
- Adds min/max/median forfeit-ply statistics across games that actually
forfeited (0 if none). Surfaced in val log line as `forfeit [min-max med N]`.
- Runs over the full validation set (was limited to 64 games).
Stopping criteria:
- Patience now also resets on improvements to game_completion_rate and
avg_plies_completed, not just val_loss and late_legal_move_rate.
- best_game_completion and best_avg_plies_completed persisted in
checkpoint state so they survive resume.
- Trainer logs `patience` and `legality_late_ply` into the training-config
record so downstream consumers (dashboard) can see them.
Tests updated for the theme refactor (layer_color moved to theme module,
titles wrapped in <b>...</b> for bold rendering).1 parent e73f6ff commit 681ab32
File tree
5 files changed
+1418
-444
lines changed- pawn
- dashboard
- tests/lab
5 files changed
+1418
-444
lines changed
0 commit comments