Skip to content

davidemodolo/alphamon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Logo
AlphaMon
Mastering the game of Pokémon Showdown with a bit of human knowledge

AlphaMon is a transformer-based battle policy model for Pokemon Showdown doubles formats, with:

  • A large-scale replay ingestion + cleaning pipeline.
  • A GPT-style decoder that predicts both next token (policy) and win probability (value).
  • A Monte Carlo search layer on top of model rollouts for actionable decisions in battle.
  • A Flask interface and an automated Showdown bot client.

The main production focus is the autoregressive AlphaMon model in src/model/transformer.py and run_pipeline.py. The diffusion branch is currently experimental and not the active development track.

Project Status

  • Main model: active and used for training/inference.
  • Data pipeline: active (harvesting, vocab build, tokenization, mmap cache).
  • Inference stack: active (search + web UI + auto bot).
  • Diffusion model: experimental baseline, maintained for reference only.

Repository Structure

Main Model Architecture

1. Token space and domain encoding

The model does not operate on raw text. It learns from structured battle tokens generated by src/data/parser.py, backed by vocabulary definitions in src/data/vocab_manager.py.

Token categories include:

  • Entities: MON:*, MOVE:*, ITEM:*, ABIL:*, TYPE:*.
  • Battle control tokens: [TURN], [SWITCH], [CMD_MOVE], [WIN], etc.
  • Numeric buckets: HP percentages (HP:0 ... HP:100) and swap indices (SWAP:1 ... SWAP:6).

This gives the model explicit battle semantics instead of requiring it to infer structure from free-form logs.

2. Decoder backbone

Implemented in src/model/transformer.py:

  • Causal decoder-only transformer.
  • Learnable token and positional embeddings.
  • Multi-head self-attention with causal masking.
  • KV-cache-aware forward pass for fast generation.

3. Dual-head training objective

The model produces:

  • Policy logits over vocabulary for next-token prediction.
  • A scalar value head for win probability.

Training loss combines:

  • Cross-entropy for policy.
  • MSE for value prediction (weighted at 0.5 in the combined objective).

Sample-level weights are included, where higher-ELO games receive higher influence (max(0.2, avg_elo / 1500) in src/data/dataset.py).

4. Training data

Dataset 1 for pretraining contains 2478339 samples.

Dataset 2 for finetuning on gen9vgc2026regf contains 86490 samples.

The complete list of replays used can be found on Hugging Face.

Data Collection and Cleaning Pipeline

1. Replay harvesting

src/data/harvest.py uses Showdown search API pagination (search.json) to download replay logs by format.

Key behavior:

  • Multi-format harvesting (threaded).
  • Dedup via replay ID tracking in replays.txt.
  • Resume support via harvest_state.json checkpoints.
  • Cutoff timestamp to avoid crawling very old formats.

Run directly:

python -m src.data.harvest

2. Metadata filtering and quality checks

data_cleaning.ipynb is used to:

  • Extract format, player ratings, and winner labels.
  • Remove low-support formats.
  • Remove low-quality/abnormal logs.
  • Inspect token length distributions vs context window.
  • Save the exact replay file list used to build datasets.

3. Dataset construction at scale (mmap)

src/data/dataset.py builds a memory-mapped cache instead of keeping tokenized tensors in RAM.

Cache includes:

  • chunks.bin (token chunks)
  • wins.bin (value targets)
  • weights.bin (sample weights)
  • metadata.json

Cache key dimensions:

  • Context length
  • ELO filters
  • Format filter
  • max_files filter

This allows multiple dataset variants to coexist and be reused across runs.

Training Pipeline (Main Model)

Script

Use run_pipeline.py.

Default strategy:

  1. Pre-train on broad replay pool.
  2. Fine-tune on stronger/more targeted subset (e.g., VGC format and min ELO).
  3. Evaluate token accuracy + value error.

Phases:

  • --phase 1: pretrain + finetune + eval
  • --phase 2: finetune + eval
  • --phase 3: eval only

Run:

python run_pipeline.py --phase 1

Telegram real-time training updates (optional)

The training pipeline supports Telegram progress messages during pretraining and finetuning.

Set these variables in your local .env:

  • TELEGRAM_BOT_TOKEN
  • TELEGRAM_CHAT_ID

Behavior in run_pipeline.py:

  • Sends an epoch-start message.
  • Edits progress roughly every 1% with ETA and current loss.
  • Sends an end-of-epoch summary (train/val losses).

If these env variables are not set, training runs normally without Telegram notifications.

Current default configuration (from code)

Current values in run_pipeline.py:

  • Model:
    • CONTEXT_LEN=2048
    • D_MODEL=512
    • N_HEAD=8
    • N_LAYERS=6
    • DROPOUT=0.1
  • Training:
    • BATCH_SIZE=64
    • GRAD_ACCUM_STEPS=4 (effective batch size 256)
    • NUM_WORKERS=6
  • Pretrain:
    • PRETRAIN_EPOCHS=3
    • PRETRAIN_LR=3e-4
    • PRETRAIN_MIN_ELO=None
  • Finetune:
    • FINETUNE_EPOCHS=10
    • FINETUNE_LR=3e-5
    • FINETUNE_MIN_ELO=1300
    • FINETUNE_MAX_ELO=None
    • FINETUNE_FORMAT=gen9vgc2026regf

Notes:

  • These are the active defaults, not historical paper values.
  • If you change model width/depth/context, old checkpoints may no longer load.

Checkpoints

Saved in checkpoints:

  • alphamon_pretrained_latest.pt
  • alphamon_pretrained_best.pt
  • alphamon_finetuned_latest.pt
  • alphamon_finetuned_best.pt

The pipeline supports resume and safe loading of compiled/non-compiled state dict keys.

Hardware notes

Current defaults are tuned for my GPU (RTX 3090). If you have less VRAM, reduce:

  • BATCH_SIZE
  • GRAD_ACCUM_STEPS
  • CONTEXT_LEN
  • NUM_WORKERS

Inference Stack

1. Search policy layer

src/monte_carlo_search.py wraps model generation with rollout-based ranking.

Supported objectives:

  • TURN: choose best action pair for active turn.
  • TEAM_PREVIEW: choose best 4 from roster and lead/back structure.
  • FAINT: choose best forced switch.

Actions are parsed into structured slot-level decisions, aggregated by signature, and ranked with confidence-adjusted scores.

2. Flask app

Run:

python server/server.py

Then open http://localhost:5000.

The app:

  • Loads best finetuned checkpoint by default.
  • Tokenizes pasted battle logs.
  • Applies optional team-aware logit bias.
  • Runs Monte Carlo search.
  • Returns ranked candidate actions.

Current inference config and behavior

Current values in server/server.py:

  • Checkpoint path: checkpoints/alphamon_finetuned_best.pt
  • Model config:
    • CONTEXT_LEN=2048
    • D_MODEL=512
    • N_HEAD=8
    • N_LAYERS=6
    • DROPOUT=0.1
  • Sampling/search:
    • temperature=0.7
    • base_rollouts=512
    • rollout_len=128
    • objective-dependent rollout caps are applied dynamically as context grows

Logit bias: what is biased and why

When optional team_data is provided to the server, AlphaMon builds a vocab-sized bias tensor and adds it to logits before sampling.

In current code, bias is applied with a fixed additive weight (bias_value=2.0) to tokens associated with known team information:

  • Pokemon species token (MON:*)
  • Known moves (MOVE:*)
  • Held item (ITEM:*)
  • Tera type (TYPE:*)

This does not force outputs; it nudges generation toward legal/expected team-consistent actions while still allowing search and model probabilities to decide final candidates.

Implementation references:

3. Automated bot mode

server/auto_server.py connects to Showdown websocket, listens for battle requests, queries the local Flask server, and sends /choose decisions.

Environment variables are defined in .env.example:

  • SHOWDOWN_BOT_NAME
  • SHOWDOWN_PASSWORD
  • SHOWDOWN_TEAM_URL

Basic flow:

  1. Start Flask server.
  2. Configure .env.
  3. Run:
python server/auto_server.py

Installation and Setup

Recommended (manual)

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Hugging Face Checkpoint Distribution

Model files are distributed via Hugging Face Hub instead of GitHub release assets.

Current model repo:

Download assets (user workflow)

From project root:

source venv/bin/activate
pip install -U huggingface_hub

hf download davidemodolo/alphamon checkpoints/alphamon_pretrained_best.pt --local-dir .
hf download davidemodolo/alphamon checkpoints/alphamon_finetuned_best.pt --local-dir .
hf download davidemodolo/alphamon release/training_games_list.txt --local-dir .

Downloaded paths should end up as:

  • checkpoints/alphamon_pretrained_best.pt
  • checkpoints/alphamon_finetuned_best.pt
  • release/training_games_list.txt

Then run inference:

python server/server.py

Open http://localhost:5000.

Diffusion Branch

The diffusion variant lives in src/diffusion_model and is documented in src/diffusion_model/README.md.

It is currently experimental and not the primary direction for AlphaMon.

Citation and Acknowledgments

Pokemon data and battle logs are sourced from Pokemon Showdown public endpoints.

If you publish work based on this repository, include:

  • The exact replay selection criteria.
  • The checkpoint identifiers used.
  • Any post-processing or search-time constraints/logit-bias settings.

About

AlphaMon is a transformer-based Pokemon Showdown doubles AI that combines autoregressive policy/value modeling with Monte Carlo rollout search, trained on large-scale replay data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors