Skip to content

benchflow-ai/pokemon-gym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PokemonGym

Join our Discord

A server for evaluating AI agents on Pokemon Red gameplay.

Group 1 (1)

Overview

PokemonGym is a platform that allows AI agents to play Pokemon Red through a server-client architecture. The system includes:

  • Evaluator: Evaluation metrics and scoring system for Pokemon Red gameplay
  • Server: FastAPI server that controls Pokemon Red emulation and exposes game state via API
  • Agents: Implementation of AI and human agents that interact with the evaluator server
  • Results: Evaluation results comparing different AI models playing Pokemon Red

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
  2. Place Pokemon ROM: Place your Pokemon Red ROM file in the root directory and name it Pokemon_Red.gb

  3. Start the evaluator server:

    python -m server.evaluator_server
  4. Run an agent:

    # Run AI agent
    python agents/demo_agent.py
    
    # OR run human interface
    python agents/human_agent.py

Installation

Prerequisites

  • Python 3.8+
  • PyBoy and its dependencies
  • Pokemon Red ROM file (not included)

Setup

  1. Clone the repository:
git clone https://github.com/benchflow-ai/pokemon-gym
cd PokemonGym
  1. Install dependencies:
pip install -r requirements.txt
  1. Place your Pokemon Red ROM file in the root directory and name it Pokemon_Red.gb

  2. Set API keys for AI agents:

export ANTHROPIC_API_KEY=your_anthropic_key_here  # For Claude
export OPENAI_API_KEY=your_openai_key_here        # For GPT-4o
export OPENROUTER_API_KEY=your_openrouter_key_here  # For Llama
export GOOGLE_API_KEY=your_google_key_here        # For Gemini

Repository Structure

PokemonGym/
├── server/                # Server implementation
│   ├── evaluator_server.py  # FastAPI server implementation
│   └── README.md            # Server documentation
├── evaluator/             # Evaluation metrics and scoring system
│   ├── evaluate.py          # Evaluation metrics implementation
│   ├── milestones.py        # Game milestones and scoring definitions
│   └── README.md            # Evaluator documentation
├── agents/                # Agent implementations
│   ├── demo_agent.py        # AI agent implementations
│   ├── human_agent.py       # Human interface agent
│   └── README.md            # Agents documentation
├── results/               # Evaluation results and comparisons
│   ├── comparison_plot.png  # Visual comparison of model performance
│   └── README.md            # Results documentation
├── pokemon_env/           # Environment utilities
├── gameplay_sessions/     # Session data storage
├── evaluate.py            # Main evaluation script
├── run.sh                 # Bash script for running evaluation
└── README.md              # Main documentation

Running the Evaluator Server

Start the evaluation server:

python -m server.evaluator_server

The server will start at http://localhost:8080 by default.

Options:

  • --host: Host to run the server on (default: 0.0.0.0)
  • --port: Port to run the server on (default: 8080)
  • --rom: Path to the Pokemon ROM file (default: Pokemon_Red.gb)
  • --log-file: Custom CSV filename (optional)

Running Agents

AI Agent

The demo AI agent uses Claude to make decisions based on the game screen:

python agents/demo_agent.py

First, set your Anthropic API key:

export ANTHROPIC_API_KEY=your_api_key_here

Options:

  • --server: Server URL (default: http://localhost:8080)
  • --steps: Number of steps to run (default: 1000000)
  • --headless: Run in headless mode
  • --sound: Enable sound (requires non-headless mode)
  • --provider: AI provider to use (claude, openai, gemini, openrouter)
  • --model: Model to use (default depends on provider)
  • --temperature: Temperature for model generation (default: 1.0)
  • --max-tokens: Max tokens for response (default: 4000)
  • --log-file: File to save agent logs (default: agent_log.jsonl)
  • --load-state: Path to a saved state file to load
  • --load-autosave: Load the latest autosave
  • --session: Session ID to continue a previous session

Human Agent

Play Pokemon Red yourself with keyboard controls:

python agents/human_agent.py

Options:

  • --server: Server URL (default: http://localhost:8080)
  • --sound: Enable sound (requires non-headless mode)
  • --load-state: Path to a saved state file to load
  • --load-autosave: Load the latest autosave
  • --session: Session ID to continue a previous session

Controls

  • Arrow Keys: Move
  • Z: A button
  • X: B button
  • Enter: Start button
  • Right Shift: Select button
  • Space: Wait (advances a few frames)
  • F5: Save current state
  • F7: Load last saved state

Game State Management

Continuing Sessions

To continue from a previous session by specifying the session ID:

# Human agent with session
python agents/human_agent.py --session session_20250404_180209

# AI agent with session
python agents/demo_agent.py --session session_20250404_180209

Loading Specific States

Load from a specific state file:

python agents/human_agent.py --load-state gameplay_sessions/session_20250404_180209/final_state.state
python agents/demo_agent.py --load-state gameplay_sessions/session_20250404_180209/final_state.state

Load the latest autosave:

python agents/human_agent.py --load-autosave
python agents/demo_agent.py --load-autosave

Component Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages