The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. It balances Upkie using wheels only. Training uses the UpkieGroundVelocity environment and the PPO implementation from Stable Baselines3.
An overview video of the training pipeline is given in this video: Sim-to-real RL pipeline for Upkie wheeled bipeds.
First, install pixi.
To test the last trained agent, first start a simulation process:
./start_simulation.shThen run the agent with:
pixi run agent
To train a new policy, let's check first that training progresses properly one rollout at a time:
pixi run show_trainingOnce this works, we can train for real with more environments and no GUI:
pixi run train <nb_envs>Adjust the number nb_envs of parallel environments based on the time/fps series. The series is reported to the command line (or to TensorBoard if you configure UPKIE_TRAINING_PATH as detailed below). Increase or decrease the number of environments until you find the sweet spot that maximizes FPS on your machine.
The repository comes with a training directory that will store logs each time a new policy is learned. Set the UPKIE_TRAINING_PATH environment variable to enable this:
export UPKIE_TRAINING_PATH="${HOME}/src/ppo_balancer/training"Trainings will be grouped automatically by day. You can start TensorBoard for today by:
pixi run tensorboardThe PPO balancer uses pixi-pack to pack a standalone Python environment to run policies on your Upkie. First, create environment.tar on your machine and upload it by:
make pack_env
make uploadThen, unpack the remote environment:
$ ssh user@your-upkie
user@your-upkie:~$ cd ppo_balancer
user@your-upkie:ppo_balancer$ make unpack_envTo run the deployed policy on your Upkie:
make run_agentHere we assumed the pi3hat spine is already up and running.
To run a policy saved to a custom path, use for instance:
python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip