PokemonGym is a platform that allows AI agents to play Pokemon Red through a server-client architecture. The system includes:
- Evaluator: Evaluation metrics and scoring system for Pokemon Red gameplay
- Server: FastAPI server that controls Pokemon Red emulation and exposes game state via API
- Agents: Implementation of AI and human agents that interact with the evaluator server
- Results: Evaluation results comparing different AI models playing Pokemon Red
-
Install dependencies:
pip install -r requirements.txt
-
Place Pokemon ROM: Place your Pokemon Red ROM file in the root directory and name it
Pokemon_Red.gb -
Start the evaluator server:
python -m server.evaluator_server
-
Run an agent:
# Run AI agent python agents/demo_agent.py # OR run human interface python agents/human_agent.py
- Python 3.8+
- PyBoy and its dependencies
- Pokemon Red ROM file (not included)
- Clone the repository:
git clone https://github.com/benchflow-ai/pokemon-gym
cd PokemonGym- Install dependencies:
pip install -r requirements.txt-
Place your Pokemon Red ROM file in the root directory and name it
Pokemon_Red.gb -
Set API keys for AI agents:
export ANTHROPIC_API_KEY=your_anthropic_key_here # For Claude
export OPENAI_API_KEY=your_openai_key_here # For GPT-4o
export OPENROUTER_API_KEY=your_openrouter_key_here # For Llama
export GOOGLE_API_KEY=your_google_key_here # For GeminiPokemonGym/
├── server/ # Server implementation
│ ├── evaluator_server.py # FastAPI server implementation
│ └── README.md # Server documentation
├── evaluator/ # Evaluation metrics and scoring system
│ ├── evaluate.py # Evaluation metrics implementation
│ ├── milestones.py # Game milestones and scoring definitions
│ └── README.md # Evaluator documentation
├── agents/ # Agent implementations
│ ├── demo_agent.py # AI agent implementations
│ ├── human_agent.py # Human interface agent
│ └── README.md # Agents documentation
├── results/ # Evaluation results and comparisons
│ ├── comparison_plot.png # Visual comparison of model performance
│ └── README.md # Results documentation
├── pokemon_env/ # Environment utilities
├── gameplay_sessions/ # Session data storage
├── evaluate.py # Main evaluation script
├── run.sh # Bash script for running evaluation
└── README.md # Main documentation
Start the evaluation server:
python -m server.evaluator_serverThe server will start at http://localhost:8080 by default.
Options:
--host: Host to run the server on (default: 0.0.0.0)--port: Port to run the server on (default: 8080)--rom: Path to the Pokemon ROM file (default: Pokemon_Red.gb)--log-file: Custom CSV filename (optional)
The demo AI agent uses Claude to make decisions based on the game screen:
python agents/demo_agent.pyFirst, set your Anthropic API key:
export ANTHROPIC_API_KEY=your_api_key_hereOptions:
--server: Server URL (default: http://localhost:8080)--steps: Number of steps to run (default: 1000000)--headless: Run in headless mode--sound: Enable sound (requires non-headless mode)--provider: AI provider to use (claude, openai, gemini, openrouter)--model: Model to use (default depends on provider)--temperature: Temperature for model generation (default: 1.0)--max-tokens: Max tokens for response (default: 4000)--log-file: File to save agent logs (default: agent_log.jsonl)--load-state: Path to a saved state file to load--load-autosave: Load the latest autosave--session: Session ID to continue a previous session
Play Pokemon Red yourself with keyboard controls:
python agents/human_agent.pyOptions:
--server: Server URL (default: http://localhost:8080)--sound: Enable sound (requires non-headless mode)--load-state: Path to a saved state file to load--load-autosave: Load the latest autosave--session: Session ID to continue a previous session
- Arrow Keys: Move
- Z: A button
- X: B button
- Enter: Start button
- Right Shift: Select button
- Space: Wait (advances a few frames)
- F5: Save current state
- F7: Load last saved state
To continue from a previous session by specifying the session ID:
# Human agent with session
python agents/human_agent.py --session session_20250404_180209
# AI agent with session
python agents/demo_agent.py --session session_20250404_180209Load from a specific state file:
python agents/human_agent.py --load-state gameplay_sessions/session_20250404_180209/final_state.state
python agents/demo_agent.py --load-state gameplay_sessions/session_20250404_180209/final_state.stateLoad the latest autosave:
python agents/human_agent.py --load-autosave
python agents/demo_agent.py --load-autosave- Evaluator Documentation: Learn about the evaluation metrics and scoring system
- Server Documentation: Details about the API server, endpoints, and state management
- Agents Documentation: Detailed information on the demo AI agent and human interface
- Results Documentation: Evaluation results and model comparisons
