PPO for OpenAI Gym

An implementation of Proximal Policy Optimization (PPO) designed for OpenAI Gymnasium. This repository is built for immediate use on BipedalWalker-v3 and Humanoid-v5. Its modular structure allows fast integration on new gym environments.

BipedalWalker-v3
A bipedal agent learning to walk on uneven terrain.

Humanoid-v5
A humanoid robot learning basic locomotion.

0. Installation

Step 1: Create a new conda environment gym:

conda create -n gym python=3.10
conda activate gym

Step 2: Clone this repository:

git clone https://github.com/jianglanwei/PPO-OpenAI-Gym
cd PPO-OpenAI-Gym

Step 3: Install dependencies:

pip install -r requirements.txt

1. Getting Started: Train PPO on `BipedalWalker-v3`

1.1 Train From Scratch

To train a BipedalWalker-v3 policy from scratch, execute:

python3 train.py --env BipedalWalker-v3

Checkpoints are saved to policy_ckpt/BipedalWalker-v3/<train_start_time>. Only the top 5 checkpoints will be retained per run.
Training hyperparameters can be customized in config/BipedalWalker-v3.yaml.
Real-time training metrics are logged to Weights & Biases (wandb).

1.2 Resume Training

Use the --resume_run flag to load a checkpoint from a previous session and continue training:

python3 train.py --env BipedalWalker-v3 --resume_run <train_start_time>

This repository includes pretrained BipedalWalker-v3 checkpoints from session 06-05-25_03:09:03 (located here). To resume training from the highest-reward checkpoint of that session, use:

python3 train.py --env BipedalWalker-v3 --resume_run 06-05-25_03:09:03

To resume from a specific epoch, add the --load_epoch flag:

python3 train.py --env BipedalWalker-v3 --resume_run 06-05-25_03:09:03 --load_epoch 990

1.3 Visualize Policy Rollout

Use play.py to render a trained agents. This script supports rendering real-time (human mode, default) or by generating GIF files (rgb_array mode, suitable for headless execution). The general command is

python3 play.py --env BipedalWalker-v3 --run <train_start_time> --epoch <epoch_number> --render_mode <human|rgb_array>

For example, to visualize BipedalWalker-v3 session 06-05-25_03:09:03 in a Gymnasium window, run:

python3 play.py --env BipedalWalker-v3 --run 06-05-25_03:09:03 --render_mode human

By default, this loads the checkpoint with the highest reward. Use --epoch to target a specific checkpoint:

python3 play.py --env BipedalWalker-v3 --run 06-05-25_03:09:03 --epoch 990 --render_mode human

The Humanoid-v5 environment:
This repository also includes tuned hyperparameters and pretrained checkpoints for Humanoid-v5.

Train:
python3 train.py --env Humanoid-v5
Virtualize Pretrained Policy:
python3 play.py --env Humanoid-v5 --run 06-06-25_19:08:43

2. Train PPO on Any Gym Environment

This repository is designed to be easily extensible. To train a PPO agent on a new OpenAI Gym environment:

2.1 Create a Configuration File

Create a new YAML file in the config/ directory named exactly after your target environment ID (e.g., LunarLander-v2.yaml). Copy an existing config file (BipedalWalker-v3.yaml or Humanoid-v5.yaml) as a template. This file defines all hyperparameters, such as learning rate, batch size, and the actor-critic network.

2.2 (Optional) Define Custom Actor-Critic Architecture

If your environment requires a specialized neural network (e.g., a CNN for pixel-based inputs):

Add a new Actor-Critic class in module.py. Refer to the existing classes in that file to ensure the input/output matches.
Update the actor_critic field in the environment's config file (from Section 2.1) to match the name of your new class.

2.3 Start Training

Start a new training session:

python3 train.py --env <env_name>

Resume from a previous session:

python3 train.py --env <env_name> --resume_run <train_start_time> --load_epoch <epoch_number>

2.4 Visualize Policy Behavior

Render the agent's performance and save the rollout as a GIF:

python3 play.py --env <env_name> --run <train_start_time> --epoch <epoch_number> --render_mode <human|rgb_array>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO for OpenAI Gym

0. Installation

1. Getting Started: Train PPO on `BipedalWalker-v3`

1.1 Train From Scratch

1.2 Resume Training

1.3 Visualize Policy Rollout

2. Train PPO on Any Gym Environment

2.1 Create a Configuration File

2.2 (Optional) Define Custom Actor-Critic Architecture

2.3 Start Training

2.4 Visualize Policy Behavior

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
gif		gif
policy_ckpt		policy_ckpt
.gitignore		.gitignore
module.py		module.py
play.py		play.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py

jianglanwei/PPO-OpenAI-Gym

Folders and files

Latest commit

History

Repository files navigation

PPO for OpenAI Gym

0. Installation

1. Getting Started: Train PPO on BipedalWalker-v3

1.1 Train From Scratch

1.2 Resume Training

1.3 Visualize Policy Rollout

2. Train PPO on Any Gym Environment

2.1 Create a Configuration File

2.2 (Optional) Define Custom Actor-Critic Architecture

2.3 Start Training

2.4 Visualize Policy Behavior

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Getting Started: Train PPO on `BipedalWalker-v3`

Packages