This repository contains the implementation of a Reinforcement Learning agent to solve the BipedalWalker-v3 environment as part of the AISF Application. For report reviewing, look here.
bipedal_walker_rl.ipynb: Main notebook with implementation, training loop, evaluation, and writeup.report.md: Summary/report notes for experiments.OBSTACLES_README.md: Notes related to obstacle settings and observations.notes/: Misc notes, e.g.Note.md.logs/: Training logs, checkpoints, and run outputs.videos/: Evaluation and demo videos.
Box2D requires swig. Install steps differ by platform:
Run these cells at the top of the notebook:
# System deps
apt-get update && apt-get install -y swig
# PyTorch with CUDA (adjust cu version if needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# RL and utilities
pip install "gymnasium[box2d]" numpy matplotlib ipython moviepy tqdm imageio[ffmpeg]# System dep
brew install swig
# Optional: create a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# PyTorch with MPS support (from PyPI)
pip install torch torchvision torchaudio
# RL and utilities
pip install "gymnasium[box2d]" numpy matplotlib ipython moviepy tqdm imageio[ffmpeg]If gymnasium[box2d] fails, ensure swig is installed, then try:
pip install box2d-pyThe code automatically detects and uses the available hardware acceleration:
- macOS (Apple Silicon): Uses Metal Performance Shaders (
mps) for GPU acceleration. - NVIDIA GPUs: Uses CUDA (
cuda) if available. - CPU: Fallback if no GPU is detected.
The notebook automatically detects hardware and configures parallel environments:
# Device auto-detection (cell 3)
if torch.backends.mps.is_available():
device = torch.device("mps") # macOS Apple Silicon
elif torch.cuda.is_available():
device = torch.device("cuda") # NVIDIA GPU
else:
device = torch.device("cpu")
# Parallel environment setup (cell 15)
NUM_ENVS = 32 # Colab with CUDA: 32 | macOS with MPS: 4 | Rendering: 1| Platform | Hardware | NUM_ENVS | Notes |
|---|---|---|---|
| Google Colab | T4 GPU (CUDA) | 32 | Maximize sample collection |
| macOS | Apple Silicon (MPS) | 4 | Balance speed and stability |
| Rendering/Video | Any | 1 | Single env for video recording |
The training loop uses gym.vector.AsyncVectorEnv internally to run multiple environments in parallel, significantly speeding up data collection while decorrelating experience for the replay buffer.
- Open
bipedal_walker_rl.ipynb - Install dependencies (first two cells):
- For Colab:
!pip install swigand!pip install gymnasium[box2d], then mount Google Drive - For macOS:
brew install swig, thenpip install gymnasium[box2d]
- For Colab:
- Configure training parameters (cell 15):
MAX_EPISODES = 2000 NUM_ENVS = 32 # Use 32 on Colab (CUDA), 4 on macOS (MPS) LEARNING_RATE = 1e-4 RENDER_FREQ = 0 # Set to 0 during training
- Choose training mode:
- With Obstacles (default): Train agent to navigate platforms, gaps, and slopes
- Baseline: Comment out obstacles section and uncomment baseline section
- Run training cells to train and save checkpoints
- Visualize results with training curves and comparison plots
- Generate videos from saved checkpoints
- Training: Uses vectorized environments (32 on Colab CUDA, 4 on macOS MPS) with rendering disabled (
RENDER_FREQ=0) - Checkpoints: Saved every 10 episodes to
logs/<run_id>/checkpoint_ep<N>.pth - Evaluation: Load checkpoint and run with
ObstacleBipedalWrapperto generate videos with obstacles visible - Videos: Saved to
videos/directory with.mp4format
# Load checkpoint and generate video with obstacles
checkpoint_path = "logs/<run_id>/checkpoint_ep990.pth"
eval_env = gym.make("BipedalWalker-v3", render_mode="rgb_array")
eval_env = ObstacleBipedalWrapper(eval_env, difficulty=0.7, seed=42)
# Initialize agent and load weights
eval_agent = SACAgent(state_dim, action_dim, device=device)
checkpoint = torch.load(checkpoint_path, map_location=device)
eval_agent.actor.load_state_dict(checkpoint['actor_state_dict'])
# Generate and save video (see final cell in notebook)- Box2D build errors: Ensure
swigis installed (apt-get install -y swigon Colab,brew install swigon macOS) - PyTorch device: Confirm
deviceresolves tocudaon Colab ormpson macOS; otherwise training will run on CPU - Slow or choppy rendering: Use
NUM_ENVS=1and avoid rendering during training (RENDER_FREQ=0) - Obstacles not visible in video: Ensure
ObstacleBipedalWrapperis applied when creating the evaluation environment
The notebook implements the Soft Actor-Critic (SAC) algorithm with the following components:
- Actor Network: 2-layer MLP (state_dim → 256 → 256 → action_dim) with Gaussian policy
- Twin Critic Networks: Two Q-networks to mitigate overestimation bias
- Replay Buffer: Stores 1M transitions for off-policy learning
- Target Networks: Soft-updated with τ=0.005 (Polyak averaging)
- Automatic Entropy Tuning: Learns temperature parameter α dynamically
- ObstacleBipedalWrapper: Procedurally generates platforms, gaps, and slopes on terrain
- Configurable difficulty: 0.0 (easy) to 1.0 (hard), default 0.7
- Terrain modifications: Adds ramps, pits, and stepping stones to increase challenge
- Parallel Environments: 32 async environments on Colab, 4 on macOS
- Learning Rate: 1e-4 (stable for long training)
- Batch Size: 256 samples per update
- Updates per Step: 1 gradient update per environment step
- Start Steps: 10,000 random exploration steps before learning