Skip to content

Commit faa96df

Browse files
authored
Provides PBT training cfg example for Isaac-Dexsuite-Kuka-Allegro-Lift-v0 env for rl_games (#3553)
# Description This PR provides a PBT builtin training example for Isaac-Dexsuite-Kuka-Allegro-Lift-v0 environment. Though we had introduction and explanation for how to run PBT, We didn't have an builtin example. This will make using PBT easier for user. Fixes # (issue) <!-- As a practice, it is recommended to open an issue to have discussions on the proposed pull request. This makes it easier for the community to keep track of what is being developed or added, and if a given feature is demanded by more than one party. --> ## Type of change - New feature (non-breaking change which adds functionality) ## Screenshots Please attach before and after screenshots of the change if applicable. <!-- Example: | Before | After | | ------ | ----- | | _gif/png before_ | _gif/png after_ | To upload images to a PR -- simply drag and drop an image while in edit mode and it should upload the image directly. You can then paste that source into the above before/after sections. --> ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
1 parent 34729a3 commit faa96df

File tree

4 files changed

+67
-5
lines changed

4 files changed

+67
-5
lines changed

docs/source/features/population_based_training.rst

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,9 +80,8 @@ You must start **one process per policy** and point them to the **same workspace
8080
Minimal flags you need:
8181

8282
* ``agent.pbt.enabled=True``
83-
* ``agent.pbt.workspace=<path/to/shared_folder>``
83+
* ``agent.pbt.directory=<path/to/shared_folder>``
8484
* ``agent.pbt.policy_idx=<0..num_policies-1>``
85-
* ``agent.pbt.num_policies=<N>``
8685

8786
.. note::
8887
All processes must use the same ``agent.pbt.workspace`` so they can see each other's checkpoints.
@@ -93,8 +92,37 @@ Minimal flags you need:
9392
Tips
9493
----
9594

96-
* Keep checkpoints fast: reduce ``interval_steps`` only if you really need tighter PBT cadence.
97-
* It is recommended to run 6+ workers to see benefit of pbt
95+
* Keep checkpoints reasonable: reduce ``interval_steps`` only if you really need tighter PBT cadence.
96+
* Use larger ``threshold_std`` and ``threshold_abs`` for greater population diversity.
97+
* It is recommended to run 6+ workers to see benefit of pbt.
98+
99+
100+
Training Example
101+
----------------
102+
103+
We provide a reference PPO config here for task:
104+
`Isaac-Dexsuite-Kuka-Allegro-Lift-v0 <https://github.com/isaac-sim/IsaacLab/blob/main/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/config/kuka_allegro/agents/rl_games_ppo_cfg.yaml>`_.
105+
For the best logging experience, we recommend using wandb for the logging in the script.
106+
107+
Launch *N* workers, where *n* indicates each worker index:
108+
109+
.. code-block:: bash
110+
111+
# Run this once per worker (n = 0..N-1), all pointing to the same directory/workspace
112+
./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py \
113+
--seed=<n> \
114+
--task=Isaac-Dexsuite-Kuka-Allegro-Lift-v0 \
115+
--num_envs=8192 \
116+
--headless \
117+
--track \
118+
--wandb-name=idx<n> \
119+
--wandb-entity=<**entity**> \
120+
--wandb-project-name=<**project**>
121+
agent.pbt.enabled=True \
122+
agent.pbt.num_policies=<N> \
123+
agent.pbt.policy_idx=<n> \
124+
agent.pbt.workspace=<**pbt_workspace_name**> \
125+
agent.pbt.directory=<**/path/to/shared_folder**> \
98126
99127
100128
References

source/isaaclab_tasks/config/extension.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22

33
# Note: Semantic Versioning is used: https://semver.org/
4-
version = "0.11.0"
4+
version = "0.11.1"
55

66
# Description
77
title = "Isaac Lab Environments"

source/isaaclab_tasks/docs/CHANGELOG.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,15 @@
11
Changelog
22
---------
33

4+
0.11.1 (2025-09-24)
5+
~~~~~~~~~~~~~~~~~~~~
6+
7+
Added
8+
^^^^^
9+
10+
* Added dextrous lifting pbt configuration example cfg for rl_games.
11+
12+
413
0.11.0 (2025-09-07)
514
~~~~~~~~~~~~~~~~~~~~
615

source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/dexsuite/config/kuka_allegro/agents/rl_games_ppo_cfg.yaml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,28 @@ params:
8484
clip_actions: False
8585
seq_len: 4
8686
bounds_loss_coef: 0.0001
87+
88+
pbt:
89+
enabled: False
90+
policy_idx: 0 # policy index in a population
91+
num_policies: 8 # total number of policies in the population
92+
directory: .
93+
workspace: "pbt_workspace" # suffix of the workspace dir name inside train_dir
94+
objective: episode.Curriculum/adr
95+
96+
# PBT hyperparams
97+
interval_steps: 50000000
98+
threshold_std: 0.1
99+
threshold_abs: 0.025
100+
mutation_rate: 0.25
101+
change_range: [1.1, 2.0]
102+
mutation:
103+
104+
agent.params.config.learning_rate: "mutate_float"
105+
agent.params.config.grad_norm: "mutate_float"
106+
agent.params.config.entropy_coef: "mutate_float"
107+
agent.params.config.critic_coef: "mutate_float"
108+
agent.params.config.bounds_loss_coef: "mutate_float"
109+
agent.params.config.kl_threshold: "mutate_float"
110+
agent.params.config.gamma: "mutate_discount"
111+
agent.params.config.tau: "mutate_discount"

0 commit comments

Comments
 (0)