Skip to content
generated from upkie/new_agent

Train a balancing policy for Upkie by reinforcement learning

License

Notifications You must be signed in to change notification settings

upkie/ppo_balancer

Repository files navigation

PPO balancer

upkie

The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. It balances Upkie using wheels only. Training uses the UpkieGroundVelocity environment and the PPO implementation from Stable Baselines3.

An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.

Getting started

First, install pixi.

To test the last trained agent, first start a simulation process:

./start_simulation.sh

Then run the agent with:

pixi run agent

Training

To train a new policy, let's check first that training progresses properly one rollout at a time:

pixi run show_training

Once this works, we can train for real with more environments and no GUI:

pixi run train <nb_envs>

Adjust the number nb_envs of parallel environments based on the time/fps series. The series is reported to the command line (or to TensorBoard if you configure UPKIE_TRAINING_PATH as detailed below). Increase or decrease the number of environments until you find the sweet spot that maximizes FPS on your machine.

TensorBoard

The repository comes with a training directory that will store logs each time a new policy is learned. Set the UPKIE_TRAINING_PATH environment variable to enable this:

export UPKIE_TRAINING_PATH="${HOME}/src/ppo_balancer/training"

Trainings will be grouped automatically by day. You can start TensorBoard for today by:

pixi run tensorboard

Real-robot execution

The PPO balancer uses pixi-pack to pack a standalone Python environment to run policies on your Upkie. First, create environment.tar on your machine and upload it by:

make pack_env
make upload

Then, unpack the remote environment:

$ ssh user@your-upkie
user@your-upkie:~$ cd ppo_balancer
user@your-upkie:ppo_balancer$ make unpack_env

Usage

To run the deployed policy on your Upkie:

make run_agent

Here we assumed the pi3hat spine is already up and running.

Advanced usage

To run a policy saved to a custom path, use for instance:

python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip

See also

About

Train a balancing policy for Upkie by reinforcement learning

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •