Pokemon Red reinforcement learning tooling built around a RAM-verified Gymnasium environment, configurable progress rewards, PPO training, and a real-time web dashboard.
PokemonRL is an educational and research-oriented framework for training agents to play Pokemon Red with reinforcement learning.
It is meant for:
- students who want to learn RL through a classic game
- beginners who want a concrete project to run and tweak
- developers who care about Game Boy emulation and RAM-driven state
Main ideas:
- PPO training via Stable Baselines3
- observations based on actual RAM (inventory, map, HP)
- reward shaping that favors progress over wandering
- a web dashboard for live runs
The latest public runs are here:
If you are new to RL, this project is a good place to start. You can:
- Observe how an agent goes from random actions to basic navigation.
- Edit
config/reward_presets.jsonand watch behavior changes. - Learn how a Game Boy game stores data like position and inventory.
This section aims for minimal guesswork.
This repository does not include:
- a Pokemon Red ROM (
pokemon_red.gb) - a PyBoy save state (
.state)
You must supply both for local training and playback.
Recommended local layout:
PokemonRL/
roms/
pokemon_red.gb
saves/
after_starter.state
Use Python 3.10 or newer.
python -m venv venv
venv\Scripts\activate
pip install --upgrade pip
pip install -e .Set your ROM and save state paths:
set POKEMONRL_ROM_PATH=roms\pokemon_red.gb
set POKEMONRL_STATE_PATH=saves\after_starter.statepython -m unittest discover -s tests -p "test_*.py" -v
python tools\verify_ram.pypython tools\train_ppo_multimodal.py --reward-profile config\brock_badge1_profile.json --timesteps 1000000 --num-envs 4
python tools\map_server.pyThen open:
PokemonRL/env/: emulator environment, RAM readers, and Gym wrappersPokemonRL/rewards/: logic for computing rewardstools/: scripts for training, watching, and the web dashboardconfig/: JSON profiles for reward tuning and event flagsdocs/: detailed guides on reward design and verification
For experienced users and researchers:
| Feature | Description |
|---|---|
| Multimodal observations | Combines screen pixels with structured RAM data. |
| Custom reward profiles | Create objectives via JSON without touching code. |
| Path overrides | Control where models, logs, and runs are saved with env vars. |
View all path overrides
| Variable | Default | Purpose |
|---|---|---|
POKEMONRL_PROJECT_ROOT |
repo root | Base directory for all relative project paths |
POKEMONRL_CONFIG_DIR |
config/ |
Config JSON directory |
POKEMONRL_MODELS_DIR |
models/ |
Saved SB3 checkpoints and policies |
POKEMONRL_ROMS_DIR |
roms/ |
Local ROM directory |
POKEMONRL_SAVES_DIR |
saves/ |
Local save-state directory |
POKEMONRL_LOGS_DIR |
logs/ |
TensorBoard and monitor output |
POKEMONRL_RUNS_DIR |
runs/ |
Live map and other run-time output |
Contributions are welcome. If you fix a bug, add documentation, or suggest a new reward term, open a PR or issue.
See CONTRIBUTING.md.
This project is licensed under the MIT License.