Skip to content

Commit cee5027

Browse files
ooctipuskellyguo11
andauthored
Adds new curriculum mdp that allows modification on any environment parameters (#2777)
# Description This PR created two curriculum mdp that can change any parameter in env instance. namely `modify_term_cfg` and `modify_env_param`. `modify_env_param` is a more general version that can override any value belongs to env, but requires user to know the full path to the value. `modify_term_cfg` only work with manager_term, but is a more user friendly version that simplify path specification, for example, instead of write "observation_manager.cfg.policy.joint_pos.noise", you instead write "observations.policy.joint_pos.noise", consistent with hydra overriding style Besides path to value is needed, modify_fn, modify_params is also needed for telling the term how to modify. Demo 1: difficulty-adaptive modification for all python native data type ``` # iv -> initial value, fv -> final value def initial_final_interpolate_fn(env: ManagerBasedRLEnv, env_id, data, iv, fv, get_fraction): iv_, fv_ = torch.tensor(iv, device=env.device), torch.tensor(fv, device=env.device) fraction = eval(get_fraction) new_val = fraction * (fv_ - iv_) + iv_ if isinstance(data, float): return new_val.item() elif isinstance(data, int): return int(new_val.item()) elif isinstance(data, (tuple, list)): raw = new_val.tolist() # assume iv is sequence of all ints or all floats: is_int = isinstance(iv[0], int) casted = [int(x) if is_int else float(x) for x in raw] return tuple(casted) if isinstance(data, tuple) else casted else: raise TypeError(f"Does not support the type {type(data)}") ``` (float) ``` joint_pos_unoise_min_adr = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "observations.policy.joint_pos.noise.n_min", "modify_fn": initial_final_interpolate_fn, "modify_params": {"iv": 0., "fv": -.1, "get_fraction": "env.command_manager.get_command("difficulty")"} } ) ``` (tuple or list) ``` command_object_pose_xrange_adr = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "commands.object_pose.ranges.pos_x", "modify_fn": initial_final_interpolate_fn, "modify_params": {"iv": (-.5, -.5), "fv": (-.75, -.25), "get_fraction": "env.command_manager.get_command("difficulty")"} } ) ``` Demo 3: overriding entire term on env_step counter rather than adaptive ``` def value_override(env: ManagerBasedRLEnv, env_id, data, new_val, num_steps): if env.common_step_counter > num_steps: return new_val return mdp.modify_term_cfg.NO_CHANGE object_pos_curriculum = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "commands.object_pose", "modify_fn": value_override, "modify_params": {"new_val": <new_observation_term>, "num_step": 120000 } } ) ``` Demo 4: overriding Tensor field within some arbitary class not visible from term_cfg (you can see that 'address' is not as nice as mdp.modify_term_cfg) ``` def resample_bucket_range(env: ManagerBasedRLEnv, env_id, data, static_friction_range, dynamic_friction_range, restitution_range, num_steps): if env.common_step_counter > num_steps: range_list = [static_friction_range, dynamic_friction_range, restitution_range] ranges = torch.tensor(range_list, device="cpu") new_buckets = math_utils.sample_uniform(ranges[:, 0], ranges[:, 1], (len(data), 3), device="cpu") return new_buckets return mdp.modify_env_param.NO_CHANGE object_physics_material_curriculum = CurrTerm( func=mdp.modify_env_param, params={ "address": "event_manager.cfg.object_physics_material.func.material_buckets", "modify_fn": resample_bucket_range, "modify_params": {"static_friction_range": [.5, 1.], "dynamic_friction_range": [.3, 1.], "restitution_range": [0.0, 0.5], "num_step": 120000 } } ) ``` ## Type of change <!-- As you go through the list, delete the ones that are not applicable. --> - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task --> --------- Signed-off-by: ooctipus <[email protected]> Signed-off-by: Kelly Guo <[email protected]> Co-authored-by: Kelly Guo <[email protected]>
1 parent 9df117c commit cee5027

File tree

6 files changed

+441
-1
lines changed

6 files changed

+441
-1
lines changed

docs/source/how-to/curriculums.rst

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
Curriculum Utilities
2+
====================
3+
4+
.. currentmodule:: isaaclab.managers
5+
6+
This guide walks through the common curriculum helper functions and terms that can be used to create flexible curricula
7+
for RL environments in Isaac Lab. These utilities can be passed to a :class:`~isaaclab.managers.CurriculumTermCfg`
8+
object to enable dynamic modification of reward weights and environment parameters during training.
9+
10+
.. note::
11+
12+
We cover three utilities in this guide:
13+
- The simple function modifies reward :func:`modify_reward_weight`
14+
- The term modify any environment parameters :class:`modify_env_param`
15+
- The term modify term_cfg :class:`modify_term_cfg`
16+
17+
.. dropdown:: Full source for curriculum utilities
18+
:icon: code
19+
20+
.. literalinclude:: ../../../source/isaaclab/isaaclab/envs/mdp/curriculums.py
21+
:language: python
22+
23+
24+
Modifying Reward Weights
25+
------------------------
26+
27+
The function :func:`modify_reward_weight` updates the weight of a reward term after a specified number of simulation
28+
steps. This can be passed directly as the ``func`` in a ``CurriculumTermCfg``.
29+
30+
.. literalinclude:: ../../../source/isaaclab/isaaclab/envs/mdp/curriculums.py
31+
:language: python
32+
:pyobject: modify_reward_weight
33+
34+
**Usage example**:
35+
36+
.. code-block:: python
37+
38+
from isaaclab.managers import CurriculumTermCfg
39+
import isaaclab.managers.mdp as mdp
40+
41+
# After 100k steps, set the "sparse_reward" term weight to 0.5
42+
sparse_reward_schedule = CurriculumTermCfg(
43+
func=mdp.modify_reward_weight,
44+
params={
45+
"term_name": "sparse_reward",
46+
"weight": 0.5,
47+
"num_steps": 100_000,
48+
}
49+
)
50+
51+
52+
Dynamically Modifying Environment Parameters
53+
--------------------------------------------
54+
55+
The class :class:`modify_env_param` is a :class:`~isaaclab.managers.ManagerTermBase` subclass that lets you target any
56+
dotted attribute path in the environment and apply a user-supplied function to compute a new value at runtime. It
57+
handles nested attributes, dictionary keys, list or tuple indexing, and respects a ``NO_CHANGE`` sentinel if no update
58+
is desired.
59+
60+
.. literalinclude:: ../../../source/isaaclab/isaaclab/envs/mdp/curriculums.py
61+
:language: python
62+
:pyobject: modify_env_param
63+
64+
**Usage example**:
65+
66+
.. code-block:: python
67+
68+
import torch
69+
from isaaclab.managers import CurriculumTermCfg
70+
import isaaclab.managers.mdp as mdp
71+
72+
def resample_friction(env, env_ids, old_value, low, high, num_steps):
73+
# After num_steps, sample a new friction coefficient uniformly
74+
if env.common_step_counter > num_steps:
75+
return torch.empty((len(env_ids),), device="cpu").uniform_(low, high)
76+
return mdp.modify_env_param.NO_CHANGE
77+
78+
friction_curriculum = CurriculumTermCfg(
79+
func=mdp.modify_env_param,
80+
params={
81+
"address": "event_manager.cfg.object_physics_material.func.material_buckets",
82+
"modify_fn": resample_friction,
83+
"modify_params": {
84+
"low": 0.3,
85+
"high": 1.0,
86+
"num_steps": 120_000,
87+
}
88+
}
89+
)
90+
91+
92+
Modify Term Configuration
93+
-------------------------
94+
95+
The subclass :class:`modify_term_cfg` provides a more concise style address syntax, using consistent with hydra config
96+
syntax. It otherwise behaves identically to :class:`modify_env_param`.
97+
98+
.. literalinclude:: ../../../source/isaaclab/isaaclab/envs/mdp/curriculums.py
99+
:language: python
100+
:pyobject: modify_term_cfg
101+
102+
**Usage example**:
103+
104+
.. code-block:: python
105+
106+
def override_command_range(env, env_ids, old_value, value, num_steps):
107+
# Override after num_steps
108+
if env.common_step_counter > num_steps:
109+
return value
110+
return mdp.modify_term_cfg.NO_CHANGE
111+
112+
range_override = CurriculumTermCfg(
113+
func=mdp.modify_term_cfg,
114+
params={
115+
"address": "commands.object_pose.ranges.pos_x",
116+
"modify_fn": override_command_range,
117+
"modify_params": {
118+
"value": (-0.75, -0.25),
119+
"num_steps": 12_000,
120+
}
121+
}
122+
)

docs/source/how-to/index.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,19 @@ This guide explains how to record an animation and video in Isaac Lab.
112112
record_animation
113113
record_video
114114

115+
116+
Dynamically Modifying Environment Parameters With CurriculumTerm
117+
----------------------------------------------------------------
118+
119+
This guide explains how to dynamically modify environment parameters during training in Isaac Lab.
120+
It covers the use of curriculum utilities to change environment parameters at runtime.
121+
122+
.. toctree::
123+
:maxdepth: 1
124+
125+
curriculums
126+
127+
115128
Mastering Omniverse
116129
-------------------
117130

source/isaaclab/config/extension.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22

33
# Note: Semantic Versioning is used: https://semver.org/
4-
version = "0.40.20"
4+
version = "0.40.21"
55

66
# Description
77
title = "Isaac Lab framework for Robot Learning"

source/isaaclab/docs/CHANGELOG.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
11
Changelog
22
---------
33

4+
0.40.21 (2025-06-25)
5+
~~~~~~~~~~~~~~~~~~~~
6+
7+
Added
8+
^^^^^
9+
10+
* Added new curriculum mdp :func:`~isaaclab.envs.mdp.curriculums.modify_env_param` and
11+
:func:`~isaaclab.envs.mdp.curriculums.modify_env_param` that enables flexible changes to any configurations in the
12+
env instance
13+
14+
415
0.40.20 (2025-07-11)
516
~~~~~~~~~~~~~~~~~~~~
617

@@ -178,6 +189,7 @@ Changed
178189
* Renamed :func:`~isaaclab.utils.noise.NoiseModel.apply` method to :func:`~isaaclab.utils.noise.NoiseModel.__call__`.
179190

180191

192+
181193
0.40.6 (2025-06-12)
182194
~~~~~~~~~~~~~~~~~~~
183195

source/isaaclab/isaaclab/envs/mdp/curriculums.py

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,12 @@
1111

1212
from __future__ import annotations
1313

14+
import re
1415
from collections.abc import Sequence
1516
from typing import TYPE_CHECKING
1617

18+
from isaaclab.managers import ManagerTermBase
19+
1720
if TYPE_CHECKING:
1821
from isaaclab.envs import ManagerBasedRLEnv
1922

@@ -34,3 +37,189 @@ def modify_reward_weight(env: ManagerBasedRLEnv, env_ids: Sequence[int], term_na
3437
# update term settings
3538
term_cfg.weight = weight
3639
env.reward_manager.set_term_cfg(term_name, term_cfg)
40+
41+
42+
class modify_env_param(ManagerTermBase):
43+
"""Curriculum term for dynamically modifying a single environment parameter at runtime.
44+
45+
This term compiles getter/setter accessors for a target attribute (specified by
46+
`cfg.params["address"]`) the first time it is called, then on each invocation
47+
reads the current value, applies a user-provided `modify_fn`, and writes back
48+
the result. Since None in this case can sometime be desirable value to write, we
49+
use token, NO_CHANGE, as non-modification signal to this class, see usage below.
50+
51+
Usage:
52+
.. code-block:: python
53+
54+
def resample_bucket_range(
55+
env, env_id, data, static_friction_range, dynamic_friction_range, restitution_range, num_steps
56+
):
57+
if env.common_step_counter > num_steps:
58+
range_list = [static_friction_range, dynamic_friction_range, restitution_range]
59+
ranges = torch.tensor(range_list, device="cpu")
60+
new_buckets = math_utils.sample_uniform(ranges[:, 0], ranges[:, 1], (len(data), 3), device="cpu")
61+
return new_buckets
62+
return mdp.modify_env_param.NO_CHANGE
63+
64+
object_physics_material_curriculum = CurrTerm(
65+
func=mdp.modify_env_param,
66+
params={
67+
"address": "event_manager.cfg.object_physics_material.func.material_buckets",
68+
"modify_fn": resample_bucket_range,
69+
"modify_params": {
70+
"static_friction_range": [.5, 1.],
71+
"dynamic_friction_range": [.3, 1.],
72+
"restitution_range": [0.0, 0.5],
73+
"num_step": 120000
74+
}
75+
}
76+
)
77+
"""
78+
79+
NO_CHANGE = object()
80+
81+
def __init__(self, cfg, env):
82+
"""
83+
Initialize the ModifyEnvParam term.
84+
85+
Args:
86+
cfg: A CurriculumTermCfg whose `params` dict must contain:
87+
- "address" (str): dotted path into the env where the parameter lives.
88+
env: The ManagerBasedRLEnv instance this term will act upon.
89+
"""
90+
super().__init__(cfg, env)
91+
self._INDEX_RE = re.compile(r"^(\w+)\[(\d+)\]$")
92+
self.get_fn: callable = None
93+
self.set_fn: callable = None
94+
self.address: str = self.cfg.params.get("address")
95+
96+
def __call__(
97+
self,
98+
env: ManagerBasedRLEnv,
99+
env_ids: Sequence[int],
100+
address: str,
101+
modify_fn: callable,
102+
modify_params: dict = {},
103+
):
104+
"""
105+
Apply one curriculum step to the target parameter.
106+
107+
On the first call, compiles and caches the getter and setter accessors.
108+
Then, retrieves the current value, passes it through `modify_fn`, and
109+
writes back the new value.
110+
111+
Args:
112+
env: The learning environment.
113+
env_ids: Sub-environment indices (unused by default).
114+
address: dotted path of the value retrieved from env.
115+
modify_fn: Function signature `fn(env, env_ids, old_value, **modify_params) -> new_value`.
116+
modify_params: Extra keyword arguments for `modify_fn`.
117+
"""
118+
if not self.get_fn:
119+
self.get_fn, self.set_fn = self._compile_accessors(self._env, self.address)
120+
121+
data = self.get_fn()
122+
new_val = modify_fn(self._env, env_ids, data, **modify_params)
123+
if new_val is not self.NO_CHANGE: # if the modify_fn return NO_CHANGE signal, do not invoke self.set_fn
124+
self.set_fn(new_val)
125+
126+
def _compile_accessors(self, root, path: str):
127+
"""
128+
Build and return (getter, setter) functions for a dotted attribute path.
129+
130+
Supports nested attributes, dict keys, and sequence indexing via "name[idx]".
131+
132+
Args:
133+
root: Base object (usually `self._env`) from which to resolve `path`.
134+
path: Dotted path string, e.g. "foo.bar[2].baz".
135+
136+
Returns:
137+
tuple:
138+
- getter: () -> current value
139+
- setter: (new_value) -> None (writes new_value back into the object)
140+
"""
141+
# Turn "a.b[2].c" into ["a", ("b",2), "c"] and store in parts
142+
parts = []
143+
for part in path.split("."):
144+
m = self._INDEX_RE.match(part)
145+
if m:
146+
parts.append((m.group(1), int(m.group(2))))
147+
else:
148+
parts.append(part)
149+
150+
cur = root
151+
for p in parts[:-1]:
152+
if isinstance(p, tuple):
153+
name, idx = p
154+
container = cur[name] if isinstance(cur, dict) else getattr(cur, name)
155+
cur = container[idx]
156+
else:
157+
cur = cur[p] if isinstance(cur, dict) else getattr(cur, p)
158+
159+
self.container = cur
160+
self.last = parts[-1]
161+
# build the getter and setter
162+
if isinstance(self.container, tuple):
163+
getter = lambda: self.container[self.last] # noqa: E731
164+
165+
def setter(val):
166+
tuple_list = list(self.container)
167+
tuple_list[self.last] = val
168+
self.container = tuple(tuple_list)
169+
170+
elif isinstance(self.container, (list, dict)):
171+
getter = lambda: self.container[self.last] # noqa: E731
172+
173+
def setter(val):
174+
self.container[self.last] = val
175+
176+
elif isinstance(self.container, object):
177+
getter = lambda: getattr(self.container, self.last) # noqa: E731
178+
179+
def setter(val):
180+
setattr(self.container, self.last, val)
181+
182+
else:
183+
raise TypeError(f"getter does not recognize the type {type(self.container)}")
184+
185+
return getter, setter
186+
187+
188+
class modify_term_cfg(modify_env_param):
189+
"""Subclass of ModifyEnvParam that maps a simplified 's.'-style address
190+
to the full manager path. This is a more natural style for writing configurations
191+
192+
Reads `cfg.params["address"]`, replaces only the first occurrence of "s."
193+
with "_manager.cfg.", and then behaves identically to ModifyEnvParam.
194+
for example: command_manager.cfg.object_pose.ranges.xpos -> commands.object_pose.ranges.xpos
195+
196+
Usage:
197+
.. code-block:: python
198+
199+
def override_value(env, env_ids, data, value, num_steps):
200+
if env.common_step_counter > num_steps:
201+
return value
202+
return mdp.modify_term_cfg.NO_CHANGE
203+
204+
command_object_pose_xrange_adr = CurrTerm(
205+
func=mdp.modify_term_cfg,
206+
params={
207+
"address": "commands.object_pose.ranges.pos_x", # note that `_manager.cfg` is omitted
208+
"modify_fn": override_value,
209+
"modify_params": {"value": (-.75, -.25), "num_steps": 12000}
210+
}
211+
)
212+
"""
213+
214+
def __init__(self, cfg, env):
215+
"""
216+
Initialize the ModifyTermCfg term.
217+
218+
Args:
219+
cfg: A CurriculumTermCfg whose `params["address"]` is a simplified
220+
path using "s." as separator, e.g. instead of write "observation_manager.cfg", writes "observations".
221+
env: The ManagerBasedRLEnv instance this term will act upon.
222+
"""
223+
super().__init__(cfg, env)
224+
input_address: str = self.cfg.params.get("address")
225+
self.address = input_address.replace("s.", "_manager.cfg.", 1)

0 commit comments

Comments
 (0)