Teaching a Cheetah Robot to Run: Solving Continuous Control in Simulated Locomotion with Proximal Policy Optimization

This project tackles the challenge of training a simulated Cheetah robot to run efficiently using reinforcement learning, focusing on the complexities of continuous action control. This project employs Proximal Policy Optimization (PPO), a stable and effective policy gradient algorithm, implemented with PyTorch, MuJoCo for realistic physics simulation, and Gymnasium as the environment framework. The policy and value networks, structured as multi-layer perceptrons, model actions via a Gaussian distribution, refined through iterative training with a clipped objective and Generalized Advantage Estimation (GAE) for stability. Key hyperparameters are tuned to optimize performance, and a deterministic policy—using the mean action—is adopted during evaluation to ensure consistency in the deterministic MuJoCo setting. The result is a Cheetah robot that achieves stable, agile locomotion with rising rewards, offering insights into continuous control and potential applications for real-world robotics.

Experiment

Explore this notebook to watch the Cheetah robot in action and understand how PPO works controlling the robot.

Result

Reward Curve

The reward tends to climb up in the course of the Cheetah robot training.

Qualitative Result

Here, the Cheetah robot is assigned to speedily maneuver forward as distantly as possible without stumbling and slipping, which can hinder its brisk gait mobility. It is illustrated in the ensuing GIF.

The Cheetah robot sprints predominantly stable in the simulated environment following the learned deterministic policy of PPO.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
Cheetah_Robot_PPO.ipynb		Cheetah_Robot_PPO.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Teaching a Cheetah Robot to Run: Solving Continuous Control in Simulated Locomotion with Proximal Policy Optimization

Experiment

Result

Reward Curve

Qualitative Result

Credit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Teaching a Cheetah Robot to Run: Solving Continuous Control in Simulated Locomotion with Proximal Policy Optimization

Experiment

Result

Reward Curve

Qualitative Result

Credit

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages