Skip to content

Commit 897d01d

Browse files
authored
Update PyBullet example (#2049)
1 parent 9836692 commit 897d01d

File tree

2 files changed

+39
-41
lines changed

2 files changed

+39
-41
lines changed

docs/guide/examples.rst

Lines changed: 37 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -397,14 +397,14 @@ PyBullet: Normalizing input features
397397
------------------------------------
398398

399399
Normalizing input features may be essential to successful training of an RL agent
400-
(by default, images are scaled but not other types of input),
401-
for instance when training on `PyBullet <https://github.com/bulletphysics/bullet3/>`__ environments. For that, a wrapper exists and
402-
will compute a running average and standard deviation of input features (it can do the same for rewards).
400+
(by default, images are scaled, but other types of input are not),
401+
for instance when training on `PyBullet <https://github.com/bulletphysics/bullet3/>`__ environments.
402+
For this, there is a wrapper ``VecNormalize`` that will compute a running average and standard deviation of the input features (it can do the same for rewards).
403403

404404

405405
.. note::
406406

407-
you need to install pybullet with ``pip install pybullet``
407+
you need to install pybullet envs with ``pip install pybullet_envs_gymnasium``
408408

409409

410410
.. image:: ../_static/img/colab-badge.svg
@@ -413,44 +413,41 @@ will compute a running average and standard deviation of input features (it can
413413

414414
.. code-block:: python
415415
416-
import os
417-
import gymnasium as gym
418-
import pybullet_envs
416+
from pathlib import Path
419417
420-
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
421-
from stable_baselines3 import PPO
418+
import pybullet_envs_gymnasium
419+
420+
from stable_baselines3.common.vec_env import VecNormalize
421+
from stable_baselines3.common.env_util import make_vec_env
422+
from stable_baselines3 import PPO
423+
424+
# Alternatively, you can use the MuJoCo equivalent "HalfCheetah-v4"
425+
vec_env = make_vec_env("HalfCheetahBulletEnv-v0", n_envs=1)
426+
# Automatically normalize the input features and reward
427+
vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True, clip_obs=10.0)
428+
429+
model = PPO("MlpPolicy", vec_env)
430+
model.learn(total_timesteps=2000)
431+
432+
# Don't forget to save the VecNormalize statistics when saving the agent
433+
log_dir = Path("/tmp/")
434+
model.save(log_dir / "ppo_halfcheetah")
435+
stats_path = log_dir / "vec_normalize.pkl"
436+
vec_env.save(stats_path)
437+
438+
# To demonstrate loading
439+
del model, vec_env
440+
441+
# Load the saved statistics
442+
vec_env = make_vec_env("HalfCheetahBulletEnv-v0", n_envs=1)
443+
vec_env = VecNormalize.load(stats_path, vec_env)
444+
# do not update them at test time
445+
vec_env.training = False
446+
# reward normalization is not needed at test time
447+
vec_env.norm_reward = False
422448
423-
# Note: pybullet is not compatible yet with Gymnasium
424-
# you might need to use `import rl_zoo3.gym_patches`
425-
# and use gym (not Gymnasium) to instantiate the env
426-
# Alternatively, you can use the MuJoCo equivalent "HalfCheetah-v4"
427-
vec_env = DummyVecEnv([lambda: gym.make("HalfCheetahBulletEnv-v0")])
428-
# Automatically normalize the input features and reward
429-
vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True,
430-
clip_obs=10.)
431-
432-
model = PPO("MlpPolicy", vec_env)
433-
model.learn(total_timesteps=2000)
434-
435-
# Don't forget to save the VecNormalize statistics when saving the agent
436-
log_dir = "/tmp/"
437-
model.save(log_dir + "ppo_halfcheetah")
438-
stats_path = os.path.join(log_dir, "vec_normalize.pkl")
439-
vec_env.save(stats_path)
440-
441-
# To demonstrate loading
442-
del model, vec_env
443-
444-
# Load the saved statistics
445-
vec_env = DummyVecEnv([lambda: gym.make("HalfCheetahBulletEnv-v0")])
446-
vec_env = VecNormalize.load(stats_path, vec_env)
447-
# do not update them at test time
448-
vec_env.training = False
449-
# reward normalization is not needed at test time
450-
vec_env.norm_reward = False
451-
452-
# Load the agent
453-
model = PPO.load(log_dir + "ppo_halfcheetah", env=vec_env)
449+
# Load the agent
450+
model = PPO.load(log_dir / "ppo_halfcheetah", env=vec_env)
454451
455452
456453
Hindsight Experience Replay (HER)

docs/misc/changelog.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@ Others:
3636

3737
Documentation:
3838
^^^^^^^^^^^^^^
39-
Added Decisions and Dragons to resources. (@jmacglashan)
39+
- Added Decisions and Dragons to resources. (@jmacglashan)
40+
- Updated PyBullet example, now compatible with Gymnasium
4041

4142
Release 2.4.0 (2024-11-18)
4243
--------------------------

0 commit comments

Comments
 (0)