AlphaMon
Mastering the game of Pokémon Showdown with a bit of human knowledge
AlphaMon is a transformer-based battle policy model for Pokemon Showdown doubles formats, with:
- A large-scale replay ingestion + cleaning pipeline.
- A GPT-style decoder that predicts both next token (policy) and win probability (value).
- A Monte Carlo search layer on top of model rollouts for actionable decisions in battle.
- A Flask interface and an automated Showdown bot client.
The main production focus is the autoregressive AlphaMon model in src/model/transformer.py and run_pipeline.py. The diffusion branch is currently experimental and not the active development track.
- Main model: active and used for training/inference.
- Data pipeline: active (harvesting, vocab build, tokenization, mmap cache).
- Inference stack: active (search + web UI + auto bot).
- Diffusion model: experimental baseline, maintained for reference only.
- run_pipeline.py: main training pipeline (pretrain, finetune, eval).
- src/model/transformer.py: AlphaMonGPT architecture.
- src/data/harvest.py: replay harvesting from Showdown API.
- src/data/vocab_manager.py: vocabulary build/cache from official Showdown data.
- src/data/parser.py: log tokenizer.
- src/data/dataset.py: mmap-backed dataset builder/loader.
- src/monte_carlo_search.py: rollout search + action ranking.
- server/server.py: Flask inference server/UI.
- server/auto_server.py: automated Showdown bot client.
- data_cleaning.ipynb: replay QC and subset construction notebook.
- src/diffusion_model: experimental diffusion branch.
The model does not operate on raw text. It learns from structured battle tokens generated by src/data/parser.py, backed by vocabulary definitions in src/data/vocab_manager.py.
Token categories include:
- Entities:
MON:*,MOVE:*,ITEM:*,ABIL:*,TYPE:*. - Battle control tokens:
[TURN],[SWITCH],[CMD_MOVE],[WIN], etc. - Numeric buckets: HP percentages (
HP:0...HP:100) and swap indices (SWAP:1...SWAP:6).
This gives the model explicit battle semantics instead of requiring it to infer structure from free-form logs.
Implemented in src/model/transformer.py:
- Causal decoder-only transformer.
- Learnable token and positional embeddings.
- Multi-head self-attention with causal masking.
- KV-cache-aware forward pass for fast generation.
The model produces:
- Policy logits over vocabulary for next-token prediction.
- A scalar value head for win probability.
Training loss combines:
- Cross-entropy for policy.
- MSE for value prediction (weighted at 0.5 in the combined objective).
Sample-level weights are included, where higher-ELO games receive higher influence (max(0.2, avg_elo / 1500) in src/data/dataset.py).
Dataset 1 for pretraining contains 2478339 samples.
Dataset 2 for finetuning on gen9vgc2026regf contains 86490 samples.
The complete list of replays used can be found on Hugging Face.
src/data/harvest.py uses Showdown search API pagination (search.json) to download replay logs by format.
Key behavior:
- Multi-format harvesting (threaded).
- Dedup via replay ID tracking in
replays.txt. - Resume support via
harvest_state.jsoncheckpoints. - Cutoff timestamp to avoid crawling very old formats.
Run directly:
python -m src.data.harvestdata_cleaning.ipynb is used to:
- Extract format, player ratings, and winner labels.
- Remove low-support formats.
- Remove low-quality/abnormal logs.
- Inspect token length distributions vs context window.
- Save the exact replay file list used to build datasets.
src/data/dataset.py builds a memory-mapped cache instead of keeping tokenized tensors in RAM.
Cache includes:
chunks.bin(token chunks)wins.bin(value targets)weights.bin(sample weights)metadata.json
Cache key dimensions:
- Context length
- ELO filters
- Format filter
max_filesfilter
This allows multiple dataset variants to coexist and be reused across runs.
Use run_pipeline.py.
Default strategy:
- Pre-train on broad replay pool.
- Fine-tune on stronger/more targeted subset (e.g., VGC format and min ELO).
- Evaluate token accuracy + value error.
Phases:
--phase 1: pretrain + finetune + eval--phase 2: finetune + eval--phase 3: eval only
Run:
python run_pipeline.py --phase 1The training pipeline supports Telegram progress messages during pretraining and finetuning.
Set these variables in your local .env:
TELEGRAM_BOT_TOKENTELEGRAM_CHAT_ID
Behavior in run_pipeline.py:
- Sends an epoch-start message.
- Edits progress roughly every 1% with ETA and current loss.
- Sends an end-of-epoch summary (train/val losses).
If these env variables are not set, training runs normally without Telegram notifications.
Current values in run_pipeline.py:
- Model:
CONTEXT_LEN=2048D_MODEL=512N_HEAD=8N_LAYERS=6DROPOUT=0.1
- Training:
BATCH_SIZE=64GRAD_ACCUM_STEPS=4(effective batch size 256)NUM_WORKERS=6
- Pretrain:
PRETRAIN_EPOCHS=3PRETRAIN_LR=3e-4PRETRAIN_MIN_ELO=None
- Finetune:
FINETUNE_EPOCHS=10FINETUNE_LR=3e-5FINETUNE_MIN_ELO=1300FINETUNE_MAX_ELO=NoneFINETUNE_FORMAT=gen9vgc2026regf
Notes:
- These are the active defaults, not historical paper values.
- If you change model width/depth/context, old checkpoints may no longer load.
Saved in checkpoints:
alphamon_pretrained_latest.ptalphamon_pretrained_best.ptalphamon_finetuned_latest.ptalphamon_finetuned_best.pt
The pipeline supports resume and safe loading of compiled/non-compiled state dict keys.
Current defaults are tuned for my GPU (RTX 3090). If you have less VRAM, reduce:
BATCH_SIZEGRAD_ACCUM_STEPSCONTEXT_LENNUM_WORKERS
src/monte_carlo_search.py wraps model generation with rollout-based ranking.
Supported objectives:
TURN: choose best action pair for active turn.TEAM_PREVIEW: choose best 4 from roster and lead/back structure.FAINT: choose best forced switch.
Actions are parsed into structured slot-level decisions, aggregated by signature, and ranked with confidence-adjusted scores.
Run:
python server/server.pyThen open http://localhost:5000.
The app:
- Loads best finetuned checkpoint by default.
- Tokenizes pasted battle logs.
- Applies optional team-aware logit bias.
- Runs Monte Carlo search.
- Returns ranked candidate actions.
Current values in server/server.py:
- Checkpoint path:
checkpoints/alphamon_finetuned_best.pt - Model config:
CONTEXT_LEN=2048D_MODEL=512N_HEAD=8N_LAYERS=6DROPOUT=0.1
- Sampling/search:
temperature=0.7base_rollouts=512rollout_len=128- objective-dependent rollout caps are applied dynamically as context grows
When optional team_data is provided to the server, AlphaMon builds a vocab-sized bias tensor and adds it to logits before sampling.
In current code, bias is applied with a fixed additive weight (bias_value=2.0) to tokens associated with known team information:
- Pokemon species token (
MON:*) - Known moves (
MOVE:*) - Held item (
ITEM:*) - Tera type (
TYPE:*)
This does not force outputs; it nudges generation toward legal/expected team-consistent actions while still allowing search and model probabilities to decide final candidates.
Implementation references:
- Bias construction in server/server.py
- Bias use inside generation in src/model/transformer.py
server/auto_server.py connects to Showdown websocket, listens for battle requests, queries the local Flask server, and sends /choose decisions.
Environment variables are defined in .env.example:
SHOWDOWN_BOT_NAMESHOWDOWN_PASSWORDSHOWDOWN_TEAM_URL
Basic flow:
- Start Flask server.
- Configure
.env. - Run:
python server/auto_server.pypython3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtModel files are distributed via Hugging Face Hub instead of GitHub release assets.
Current model repo:
From project root:
source venv/bin/activate
pip install -U huggingface_hub
hf download davidemodolo/alphamon checkpoints/alphamon_pretrained_best.pt --local-dir .
hf download davidemodolo/alphamon checkpoints/alphamon_finetuned_best.pt --local-dir .
hf download davidemodolo/alphamon release/training_games_list.txt --local-dir .Downloaded paths should end up as:
checkpoints/alphamon_pretrained_best.ptcheckpoints/alphamon_finetuned_best.ptrelease/training_games_list.txt
Then run inference:
python server/server.pyOpen http://localhost:5000.
The diffusion variant lives in src/diffusion_model and is documented in src/diffusion_model/README.md.
It is currently experimental and not the primary direction for AlphaMon.
Pokemon data and battle logs are sourced from Pokemon Showdown public endpoints.
If you publish work based on this repository, include:
- The exact replay selection criteria.
- The checkpoint identifiers used.
- Any post-processing or search-time constraints/logit-bias settings.