PokemonGym

A server for evaluating AI agents on Pokemon Red gameplay.

Overview

PokemonGym is a platform that allows AI agents to play Pokemon Red through a server-client architecture. The system includes:

Evaluator: Evaluation metrics and scoring system for Pokemon Red gameplay
Server: FastAPI server that controls Pokemon Red emulation and exposes game state via API
Agents: Implementation of AI and human agents that interact with the evaluator server
Results: Evaluation results comparing different AI models playing Pokemon Red

Quick Start

Install dependencies:
```
pip install -r requirements.txt
```
Place Pokemon ROM: Place your Pokemon Red ROM file in the root directory and name it Pokemon_Red.gb
Start the evaluator server:
```
python -m server.evaluator_server
```

Run an agent:

# Run AI agent
python agents/demo_agent.py

# OR run human interface
python agents/human_agent.py

Installation

Prerequisites

Python 3.8+
PyBoy and its dependencies
Pokemon Red ROM file (not included)

Setup

Clone the repository:

git clone https://github.com/benchflow-ai/pokemon-gym
cd PokemonGym

Install dependencies:

pip install -r requirements.txt

Place your Pokemon Red ROM file in the root directory and name it Pokemon_Red.gb
Set API keys for AI agents:

export ANTHROPIC_API_KEY=your_anthropic_key_here  # For Claude
export OPENAI_API_KEY=your_openai_key_here        # For GPT-4o
export OPENROUTER_API_KEY=your_openrouter_key_here  # For Llama
export GOOGLE_API_KEY=your_google_key_here        # For Gemini

Repository Structure

PokemonGym/
├── server/                # Server implementation
│   ├── evaluator_server.py  # FastAPI server implementation
│   └── README.md            # Server documentation
├── evaluator/             # Evaluation metrics and scoring system
│   ├── evaluate.py          # Evaluation metrics implementation
│   ├── milestones.py        # Game milestones and scoring definitions
│   └── README.md            # Evaluator documentation
├── agents/                # Agent implementations
│   ├── demo_agent.py        # AI agent implementations
│   ├── human_agent.py       # Human interface agent
│   └── README.md            # Agents documentation
├── results/               # Evaluation results and comparisons
│   ├── comparison_plot.png  # Visual comparison of model performance
│   └── README.md            # Results documentation
├── pokemon_env/           # Environment utilities
├── gameplay_sessions/     # Session data storage
├── evaluate.py            # Main evaluation script
├── run.sh                 # Bash script for running evaluation
└── README.md              # Main documentation

Running the Evaluator Server

Start the evaluation server:

python -m server.evaluator_server

The server will start at http://localhost:8080 by default.

Options:

--host: Host to run the server on (default: 0.0.0.0)
--port: Port to run the server on (default: 8080)
--rom: Path to the Pokemon ROM file (default: Pokemon_Red.gb)
--log-file: Custom CSV filename (optional)

Running Agents

AI Agent

The demo AI agent uses Claude to make decisions based on the game screen:

python agents/demo_agent.py

First, set your Anthropic API key:

export ANTHROPIC_API_KEY=your_api_key_here

Options:

--server: Server URL (default: http://localhost:8080)
--steps: Number of steps to run (default: 1000000)
--headless: Run in headless mode
--sound: Enable sound (requires non-headless mode)
--provider: AI provider to use (claude, openai, gemini, openrouter)
--model: Model to use (default depends on provider)
--temperature: Temperature for model generation (default: 1.0)
--max-tokens: Max tokens for response (default: 4000)
--log-file: File to save agent logs (default: agent_log.jsonl)
--load-state: Path to a saved state file to load
--load-autosave: Load the latest autosave
--session: Session ID to continue a previous session

Human Agent

Play Pokemon Red yourself with keyboard controls:

python agents/human_agent.py

Options:

--server: Server URL (default: http://localhost:8080)
--sound: Enable sound (requires non-headless mode)
--load-state: Path to a saved state file to load
--load-autosave: Load the latest autosave
--session: Session ID to continue a previous session

Controls

Arrow Keys: Move
Z: A button
X: B button
Enter: Start button
Right Shift: Select button
Space: Wait (advances a few frames)
F5: Save current state
F7: Load last saved state

Game State Management

Continuing Sessions

To continue from a previous session by specifying the session ID:

# Human agent with session
python agents/human_agent.py --session session_20250404_180209

# AI agent with session
python agents/demo_agent.py --session session_20250404_180209

Loading Specific States

Load from a specific state file:

python agents/human_agent.py --load-state gameplay_sessions/session_20250404_180209/final_state.state
python agents/demo_agent.py --load-state gameplay_sessions/session_20250404_180209/final_state.state

Load the latest autosave:

python agents/human_agent.py --load-autosave
python agents/demo_agent.py --load-autosave

Component Documentation

Evaluator Documentation: Learn about the evaluation metrics and scoring system
Server Documentation: Details about the API server, endpoints, and state management
Agents Documentation: Detailed information on the demo AI agent and human interface
Results Documentation: Evaluation results and model comparisons

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PokemonGym

Overview

Quick Start

Installation

Prerequisites

Setup

Repository Structure

Running the Evaluator Server

Running Agents

AI Agent

Human Agent

Controls

Game State Management

Continuing Sessions

Loading Specific States

Component Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
agents		agents
evaluator		evaluator
pokemon_env		pokemon_env
results		results
server		server
.gitignore		.gitignore
README.md		README.md
benchflow_interface.py		benchflow_interface.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PokemonGym

Overview

Quick Start

Installation

Prerequisites

Setup

Repository Structure

Running the Evaluator Server

Running Agents

AI Agent

Human Agent

Controls

Game State Management

Continuing Sessions

Loading Specific States

Component Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages