Skip to content

Commit c8e25d7

Browse files
vmoenscursoragent
andcommitted
[Doc] Add IsaacLab integration guide and setup script (#3486)
- knowledge_base/ISAACLAB.md: comprehensive guide covering import order, TiledCamera pixel observations, pre-vectorized env gotchas, 2-GPU async pipeline, replay buffer sizing, and 20+ documented pitfalls - setup-and-run.sh: idempotent cluster setup script for running Dreamer with IsaacLab in Docker containers Co-authored-by: Cursor <[email protected]> ghstack-source-id: 6371f14 Pull-Request: #3486 Co-authored-by: Cursor <[email protected]>
1 parent 461f6bb commit c8e25d7

File tree

4 files changed

+643
-0
lines changed

4 files changed

+643
-0
lines changed

docs/source/reference/envs.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,4 @@ Documentation Sections
6060
envs_multiagent
6161
envs_libraries
6262
envs_recorders
63+
isaaclab

docs/source/reference/isaaclab.rst

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
.. currentmodule:: torchrl
2+
3+
IsaacLab Integration
4+
====================
5+
6+
.. _ref_isaaclab:
7+
8+
This guide covers how to use TorchRL components with
9+
`IsaacLab <https://isaac-sim.github.io/IsaacLab/v2.3.0/>`_
10+
(NVIDIA's GPU-accelerated robotics simulation platform).
11+
12+
For general IsaacLab installation and cluster setup (not specific to TorchRL), see the
13+
`knowledge_base/ISAACLAB.md <https://github.com/pytorch/rl/blob/main/knowledge_base/ISAACLAB.md>`_ file.
14+
15+
IsaacLabWrapper
16+
---------------
17+
18+
Use :class:`~torchrl.envs.libs.isaac_lab.IsaacLabWrapper` to wrap a gymnasium
19+
IsaacLab environment into a TorchRL-compatible :class:`~torchrl.envs.EnvBase`:
20+
21+
.. code-block:: python
22+
23+
import gymnasium as gym
24+
from torchrl.envs.libs.isaac_lab import IsaacLabWrapper
25+
26+
env = gym.make("Isaac-Ant-v0", cfg=env_cfg)
27+
env = IsaacLabWrapper(env)
28+
29+
Key defaults:
30+
31+
- ``device=cuda:0``
32+
- ``allow_done_after_reset=True`` (IsaacLab can report done immediately after reset)
33+
- ``convert_actions_to_numpy=False`` (actions stay as tensors)
34+
35+
.. note::
36+
37+
IsaacLab modifies ``terminated`` and ``truncated`` tensors in-place.
38+
``IsaacLabWrapper`` clones these tensors to prevent data corruption.
39+
40+
.. note::
41+
42+
Batched specs: IsaacLab env specs include the batch dimension (e.g., shape
43+
``(4096, obs_dim)``). Use ``*_spec_unbatched`` properties when you need
44+
per-env shapes.
45+
46+
.. note::
47+
48+
Reward shape: IsaacLab rewards are ``(num_envs,)``. The wrapper
49+
unsqueezes to ``(num_envs, 1)`` for TorchRL compatibility.
50+
51+
Collector
52+
---------
53+
54+
Because IsaacLab environments are **pre-vectorized** (a single ``gym.make``
55+
creates ~4096 parallel environments on the GPU), use a single
56+
:class:`~torchrl.collectors.Collector` — there is no need for
57+
``ParallelEnv`` or ``MultiCollector``:
58+
59+
.. code-block:: python
60+
61+
from torchrl.collectors import Collector
62+
63+
collector = Collector(
64+
create_env_fn=env,
65+
policy=policy,
66+
frames_per_batch=40960, # 10 env steps * 4096 envs
67+
storing_device="cpu",
68+
no_cuda_sync=True, # IMPORTANT for CUDA envs
69+
)
70+
71+
- ``no_cuda_sync=True``: avoids unnecessary CUDA synchronisation that can
72+
cause hangs with GPU-native environments.
73+
- ``storing_device="cpu"``: moves collected data to CPU for the replay buffer.
74+
75+
2-GPU Async Pipeline
76+
~~~~~~~~~~~~~~~~~~~~
77+
78+
For maximum throughput, use two GPUs with a background collection thread:
79+
80+
- **GPU 0 (``sim_device``)**: IsaacLab simulation + collection policy
81+
inference
82+
- **GPU 1 (``train_device``)**: Model training (world model, actor, value
83+
gradients)
84+
85+
.. code-block:: python
86+
87+
import copy, threading
88+
from tensordict import TensorDict
89+
90+
# Deep copy policy to sim_device for collection
91+
collector_policy = copy.deepcopy(policy).to(sim_device)
92+
93+
# Background thread for continuous collection
94+
def collect_loop(collector, replay_buffer, stop_event):
95+
for data in collector:
96+
replay_buffer.extend(data)
97+
if stop_event.is_set():
98+
break
99+
100+
# Main thread: train on train_device
101+
for optim_step in range(total_steps):
102+
batch = replay_buffer.sample()
103+
train(batch) # all on cuda:1
104+
# Periodic weight sync: training policy -> collector policy
105+
if optim_step % sync_every == 0:
106+
weights = TensorDict.from_module(policy)
107+
collector.update_policy_weights_(weights)
108+
109+
Key points:
110+
111+
- Both CUDA operations release the GIL, so they truly overlap.
112+
- Must pass ``TensorDict.from_module(policy)`` to
113+
``update_policy_weights_()``, not the module itself.
114+
- Set ``CUDA_VISIBLE_DEVICES=0,1`` to expose 2 GPUs (IsaacLab defaults to
115+
only GPU 0).
116+
- Falls back gracefully to single-GPU if only 1 GPU is available.
117+
118+
RayCollector (alternative)
119+
~~~~~~~~~~~~~~~~~~~~~~~~~~
120+
121+
If you need distributed collection across multiple GPUs/nodes, use
122+
:class:`~torchrl.collectors.distributed.RayCollector`:
123+
124+
.. code-block:: python
125+
126+
from torchrl.collectors.distributed import RayCollector
127+
128+
collector = RayCollector(
129+
[make_env] * num_collectors,
130+
policy,
131+
frames_per_batch=8192,
132+
collector_kwargs={
133+
"trust_policy": True,
134+
"no_cuda_sync": True,
135+
},
136+
)
137+
138+
Replay Buffer
139+
-------------
140+
141+
The :class:`~torchrl.data.SliceSampler` needs enough sequential data. With
142+
``batch_length=50``, you need at least 50 time steps per trajectory before
143+
sampling::
144+
145+
init_random_frames >= batch_length * num_envs
146+
= 50 * 4096
147+
= 204,800
148+
149+
For GPU-resident replay buffers, use
150+
:class:`~torchrl.data.LazyTensorStorage` with the target CUDA device.
151+
This avoids CPU→GPU transfer at sample time (but adds it at extend time).
152+
153+
TorchRL-Specific Gotchas
154+
------------------------
155+
156+
1. **``no_cuda_sync=True``**: Always set this for collectors with CUDA
157+
environments. Without it, you get mysterious hangs.
158+
159+
2. **Installing torchrl in Isaac container**: Use
160+
``--no-build-isolation --no-deps`` to avoid conflicts with Isaac's
161+
pre-installed torch/numpy.
162+
163+
3. **``TensorDictPrimer`` ``expand_specs``**: When adding primers (e.g.,
164+
``state``, ``belief``) to a pre-vectorized env, you MUST pass
165+
``expand_specs=True`` to :class:`~torchrl.envs.TensorDictPrimer`.
166+
Otherwise the primer shapes ``()`` conflict with the env's ``batch_size``
167+
``(4096,)``.
168+
169+
4. **Model-based env spec double-batching**:
170+
``model_based_env.set_specs_from_env(batched_env)`` copies specs with batch
171+
dims baked in. The model-based env then double-batches actions during
172+
sampling (e.g., ``(4096, 4096, 8)`` instead of ``(4096, 8)``).
173+
174+
**Fix**: unbatch the model-based env's specs after copying:
175+
176+
.. code-block:: python
177+
178+
model_based_env.set_specs_from_env(test_env)
179+
if test_env.batch_size:
180+
idx = (0,) * len(test_env.batch_size)
181+
model_based_env.__dict__["_output_spec"] = (
182+
model_based_env.__dict__["_output_spec"][idx]
183+
)
184+
model_based_env.__dict__["_input_spec"] = (
185+
model_based_env.__dict__["_input_spec"][idx]
186+
)
187+
model_based_env.empty_cache()
188+
189+
5. **``torch.compile`` with TensorDict**: Compiling full loss modules crashes
190+
because dynamo traces through TensorDict internals. **Fix**: compile
191+
individual MLP sub-modules (encoder, decoder, reward_model, value_model)
192+
with ``torch._dynamo.config.suppress_errors = True``. Do NOT compile RSSM
193+
(sequential, shared with collector) or loss modules (heavy TensorDict use).
194+
195+
6. **``SliceSampler`` with ``strict_length=False``**: The sampler may return
196+
fewer elements than ``batch_size``. This causes
197+
``reshape(-1, batch_length)`` to fail.
198+
199+
**Fix**: truncate the sample:
200+
201+
.. code-block:: python
202+
203+
sample = replay_buffer.sample()
204+
numel = sample.numel()
205+
usable = (numel // batch_length) * batch_length
206+
if usable < numel:
207+
sample = sample[:usable]
208+
sample = sample.reshape(-1, batch_length)
209+
210+
7. **``frames_per_batch`` vs ``batch_length``**: Each collection adds
211+
``frames_per_batch / num_envs`` time steps per env. The
212+
``SliceSampler`` needs contiguous sequences of at least ``batch_length``
213+
steps within a single trajectory. Ensure
214+
``frames_per_batch >= batch_length * num_envs`` for the initial collection,
215+
or that ``init_random_frames >= batch_length * num_envs``.
216+
217+
8. **``TD_GET_DEFAULTS_TO_NONE``**: Set this environment variable to ``1``
218+
when running inside the Isaac container to ensure correct TensorDict
219+
default behavior.

0 commit comments

Comments
 (0)