Skip to content

Latest commit

 

History

History
236 lines (185 loc) · 9.05 KB

File metadata and controls

236 lines (185 loc) · 9.05 KB

Mario OpenEnv: Multi-Approach Reinforcement Learning for Super Mario Bros

A comprehensive research framework implementing multiple reinforcement learning approaches for training agents to play Super Mario Bros Level 1-1. This project combines traditional deep RL methods with cutting-edge LLM-based techniques through a unified OpenEnv-compatible interface.

Overview

This repository contains four complementary approaches to solving Super Mario Bros through reinforcement learning:

Core Components

mario_env/ - OpenEnv-Compatible Environment Wrapper

  • OpenEnv Protocol: Standardized HTTP-based environment interface
  • Rich RAM Features: Detailed enemy tracking, obstacle detection, powerup analysis
  • Multiple Action Sets: Simple (7), complex (12), and right-only (5) action spaces
  • Advanced Preprocessing: Frame stacking, grayscale conversion, downsampling
  • Docker Deployment: Containerized environment server for distributed training

mario_ppo/ - Traditional PPO Implementation

  • Convolutional Neural Networks: Visual policy learning from pixel observations
  • Parallel Environment Execution: 16+ parallel environments for efficient training
  • Stable Training: Proximal Policy Optimization with Generalized Advantage Estimation
  • Real-time Inference: 1000+ FPS execution speed
  • Sample Efficient: Learns from millions of gameplay frames

mario_grpo/ - LLM-Based GRPO Training

  • Code Generation as Policy: LLMs generate Python strategies instead of neural policies
  • Interpretable Strategies: Human-readable code with reasoning
  • Long-term Planning: Strategic decision-making beyond reactive control
  • Parallel Strategy Evaluation: Multiple strategies tested simultaneously
  • Transfer Learning: Leverages pre-trained language model knowledge

mario_baseline/ - Random Agent Baseline

  • Performance Reference: Establishes minimum performance thresholds
  • Statistical Analysis: Comprehensive evaluation metrics
  • Video Recording: Qualitative gameplay analysis
  • Reproducibility: Deterministic random action selection

Quick Start

Prerequisites

  • Python 3.12+
  • CUDA-compatible GPU (recommended for training)
  • Docker (for environment deployment)

Installation

# Clone repository
git clone https://github.com/3xCaffeine/mario-openenv.git
cd mario-openenv

# Install with uv (recommended)
uv sync

# For GPU support
uv sync --extra gpu

Basic Usage

Environment Server

# Start Docker environment
cd mario_env
docker build -t mario-env .
docker run -p 8000:8000 mario-env

# Or run locally
uv run python -m mario_env.server

PPO Training

cd mario_ppo
uv run python train.py --world 1 --stage 1

GRPO Training

cd mario_grpo
uv run python train.py

Baseline Evaluation

cd mario_baseline
uv run python mario_random.py --episodes 100

Architecture

Environment Interface

┌─────────────────┐    HTTP    ┌──────────────────┐
│   RL Agent      │◄──────────►│  Mario Env      │
│  (PPO/GRPO)     │            │  Server         │
└─────────────────┘            └──────────────────┘
                                   │
                                   ▼
                            ┌──────────────────┐
                            │ Super Mario Bros │
                            │   (NES Emulator) │
                            └──────────────────┘

Training Approaches

Traditional RL Pipeline

  1. Visual Input → CNN Feature Extraction
  2. Policy Network → Action Probabilities
  3. Value Network → State Value Estimation
  4. PPO Optimization → Policy Improvement

LLM-Based Pipeline

  1. Game State → Structured Observation
  2. Language Model → Python Strategy Generation
  3. Code Execution → Strategy Evaluation
  4. GRPO Optimization → Strategy Improvement

Configuration

Environment Variables

# Game settings
export MARIO_LEVEL="SuperMarioBros-1-1-Vanilla"
export MARIO_ACTION_SET="simple"  # simple/complex/right_only

# Observation settings
export MARIO_OBS_MODE="downsampled"  # rgb/grayscale/downsampled
export MARIO_OBS_SIZE="84"
export MARIO_FRAME_STACK="4"

# Training settings
export MARIO_REWARD_X_POS="true"
export MARIO_EPISODIC_LIFE="true"

Model Configuration

  • PPO: Custom CNN with 32 filters, 512 hidden units
  • GRPO: Qwen2.5-Coder-3B-Instruct with LoRA fine-tuning
  • Training: Mixed precision, gradient accumulation, distributed execution

Game Features

Observation Space

  • Visual: 84×84 grayscale/downsampled RGB frames
  • RAM Features: Enemy positions, obstacle detection, powerup tracking
  • Player State: Position, velocity, power-up status, lives
  • Game State: Score, coins, time, world/stage progression

Action Space

  • Simple (7 actions): Basic movement + jump combinations
  • Complex (12 actions): Full NES controller including up/down
  • Right-Only (5 actions): Forward-only movement for easier learning

Reward Structure

  • Primary: Score progression and level completion
  • Auxiliary: X-position advancement, enemy defeat, coin collection
  • Penalties: Time expiration, life loss, backward movement

Research Applications

Algorithm Comparison

  • Traditional vs LLM-based RL: Performance and efficiency trade-offs
  • Sample Efficiency: Frames vs episodes required for learning
  • Generalization: Transfer across levels and game variants

Interpretability Studies

  • Strategy Analysis: Understanding LLM-generated gameplay logic
  • Decision Trees: Extracting rules from trained neural policies
  • Human-AI Collaboration: Combining human expertise with learned strategies

Environment Research

  • RAM Feature Impact: Effect of auxiliary observations on learning
  • Reward Engineering: Optimal reward shaping for complex games
  • Curriculum Learning: Progressive difficulty for stable training

Contributing

Development Setup

# Install development dependencies
uv sync --extra gpu

Project Structure

mario-openenv/
├── mario_env/          # OpenEnv wrapper
├── mario_ppo/          # PPO implementation
├── mario_grpo/         # GRPO training
├── mario_baseline/     # Random baseline
├── tests/              # Test suite
└── pyproject.toml      # Project configuration

Documentation

Acknowledgments

This project builds upon several key open-source implementations and research frameworks:

Core Dependencies and Forks

Research Frameworks and Libraries

  • PyTorch: Deep learning framework for neural network implementations
  • Transformers: Hugging Face library for LLM model handling
  • TRL (Transformer Reinforcement Learning): Library for training transformer-based RL models
  • Gymnasium: Modern reinforcement learning environments (successor to OpenAI Gym)
  • Modal: Cloud platform for scalable ML training and deployment
  • FastAPI: Modern web framework for the environment server
  • OpenCV: Computer vision library for image processing

Additional Acknowledgments

  • OpenAI Gym Super Mario Bros: Original environment implementation that inspired this work
  • NES emulator community: For maintaining and improving NES emulation technology
  • Reinforcement learning research community: For developing the algorithms and methodologies used

License

This project is open source and available under the MIT License.


Built for reinforcement learning research on classic games