PPO balancer

The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. It balances Upkie using wheels only. Training uses the UpkieGroundVelocity environment and the PPO implementation from Stable Baselines3.

An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.

Getting started

First, install pixi.

To test the last trained agent, first start a simulation process:

./start_simulation.sh

Then run the agent with:

pixi run agent

Training

To train a new policy, let's check first that training progresses properly one rollout at a time:

pixi run show_training

Once this works, we can train for real with more environments and no GUI:

pixi run train <nb_envs>

Adjust the number nb_envs of parallel environments based on the time/fps series. The series is reported to the command line (or to TensorBoard if you configure UPKIE_TRAINING_PATH as detailed below). Increase or decrease the number of environments until you find the sweet spot that maximizes FPS on your machine.

TensorBoard

The repository comes with a training directory that will store logs each time a new policy is learned. Set the UPKIE_TRAINING_PATH environment variable to enable this:

export UPKIE_TRAINING_PATH="${HOME}/src/ppo_balancer/training"

Trainings will be grouped automatically by day. You can start TensorBoard for today by:

pixi run tensorboard

Real-robot execution

The PPO balancer uses pixi-pack to pack a standalone Python environment to run policies on your Upkie. First, create environment.tar on your machine and upload it by:

make pack_env
make upload

Then, unpack the remote environment:

$ ssh user@your-upkie
user@your-upkie:~$ cd ppo_balancer
user@your-upkie:ppo_balancer$ make unpack_env

Usage

To run the deployed policy on your Upkie:

make run_agent

Here we assumed the pi3hat spine is already up and running.

Advanced usage

To run a policy saved to a custom path, use for instance:

python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
policy		policy
ppo_balancer		ppo_balancer
tools		tools
training		training
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.gitattributes		.gitattributes
.gitignore		.gitignore
BUILD		BUILD
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
WORKSPACE		WORKSPACE
config.gin		config.gin
environment.yaml		environment.yaml
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
start_simulation.sh		start_simulation.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO balancer

Getting started

Training

TensorBoard

Real-robot execution

Usage

Advanced usage

See also

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

upkie/ppo_balancer

Folders and files

Latest commit

History

Repository files navigation

PPO balancer

Getting started

Training

TensorBoard

Real-robot execution

Usage

Advanced usage

See also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages