|
| 1 | +.. currentmodule:: torchrl |
| 2 | + |
| 3 | +IsaacLab Integration |
| 4 | +==================== |
| 5 | + |
| 6 | +.. _ref_isaaclab: |
| 7 | + |
| 8 | +This guide covers how to use TorchRL components with |
| 9 | +`IsaacLab <https://isaac-sim.github.io/IsaacLab/v2.3.0/>`_ |
| 10 | +(NVIDIA's GPU-accelerated robotics simulation platform). |
| 11 | + |
| 12 | +For general IsaacLab installation and cluster setup (not specific to TorchRL), see the |
| 13 | +`knowledge_base/ISAACLAB.md <https://github.com/pytorch/rl/blob/main/knowledge_base/ISAACLAB.md>`_ file. |
| 14 | + |
| 15 | +IsaacLabWrapper |
| 16 | +--------------- |
| 17 | + |
| 18 | +Use :class:`~torchrl.envs.libs.isaac_lab.IsaacLabWrapper` to wrap a gymnasium |
| 19 | +IsaacLab environment into a TorchRL-compatible :class:`~torchrl.envs.EnvBase`: |
| 20 | + |
| 21 | +.. code-block:: python |
| 22 | +
|
| 23 | + import gymnasium as gym |
| 24 | + from torchrl.envs.libs.isaac_lab import IsaacLabWrapper |
| 25 | +
|
| 26 | + env = gym.make("Isaac-Ant-v0", cfg=env_cfg) |
| 27 | + env = IsaacLabWrapper(env) |
| 28 | +
|
| 29 | +Key defaults: |
| 30 | + |
| 31 | +- ``device=cuda:0`` |
| 32 | +- ``allow_done_after_reset=True`` (IsaacLab can report done immediately after reset) |
| 33 | +- ``convert_actions_to_numpy=False`` (actions stay as tensors) |
| 34 | + |
| 35 | +.. note:: |
| 36 | + |
| 37 | + IsaacLab modifies ``terminated`` and ``truncated`` tensors in-place. |
| 38 | + ``IsaacLabWrapper`` clones these tensors to prevent data corruption. |
| 39 | + |
| 40 | +.. note:: |
| 41 | + |
| 42 | + Batched specs: IsaacLab env specs include the batch dimension (e.g., shape |
| 43 | + ``(4096, obs_dim)``). Use ``*_spec_unbatched`` properties when you need |
| 44 | + per-env shapes. |
| 45 | + |
| 46 | +.. note:: |
| 47 | + |
| 48 | + Reward shape: IsaacLab rewards are ``(num_envs,)``. The wrapper |
| 49 | + unsqueezes to ``(num_envs, 1)`` for TorchRL compatibility. |
| 50 | + |
| 51 | +Collector |
| 52 | +--------- |
| 53 | + |
| 54 | +Because IsaacLab environments are **pre-vectorized** (a single ``gym.make`` |
| 55 | +creates ~4096 parallel environments on the GPU), use a single |
| 56 | +:class:`~torchrl.collectors.Collector` — there is no need for |
| 57 | +``ParallelEnv`` or ``MultiCollector``: |
| 58 | + |
| 59 | +.. code-block:: python |
| 60 | +
|
| 61 | + from torchrl.collectors import Collector |
| 62 | +
|
| 63 | + collector = Collector( |
| 64 | + create_env_fn=env, |
| 65 | + policy=policy, |
| 66 | + frames_per_batch=40960, # 10 env steps * 4096 envs |
| 67 | + storing_device="cpu", |
| 68 | + no_cuda_sync=True, # IMPORTANT for CUDA envs |
| 69 | + ) |
| 70 | +
|
| 71 | +- ``no_cuda_sync=True``: avoids unnecessary CUDA synchronisation that can |
| 72 | + cause hangs with GPU-native environments. |
| 73 | +- ``storing_device="cpu"``: moves collected data to CPU for the replay buffer. |
| 74 | + |
| 75 | +2-GPU Async Pipeline |
| 76 | +~~~~~~~~~~~~~~~~~~~~ |
| 77 | + |
| 78 | +For maximum throughput, use two GPUs with a background collection thread: |
| 79 | + |
| 80 | +- **GPU 0 (``sim_device``)**: IsaacLab simulation + collection policy |
| 81 | + inference |
| 82 | +- **GPU 1 (``train_device``)**: Model training (world model, actor, value |
| 83 | + gradients) |
| 84 | + |
| 85 | +.. code-block:: python |
| 86 | +
|
| 87 | + import copy, threading |
| 88 | + from tensordict import TensorDict |
| 89 | +
|
| 90 | + # Deep copy policy to sim_device for collection |
| 91 | + collector_policy = copy.deepcopy(policy).to(sim_device) |
| 92 | +
|
| 93 | + # Background thread for continuous collection |
| 94 | + def collect_loop(collector, replay_buffer, stop_event): |
| 95 | + for data in collector: |
| 96 | + replay_buffer.extend(data) |
| 97 | + if stop_event.is_set(): |
| 98 | + break |
| 99 | +
|
| 100 | + # Main thread: train on train_device |
| 101 | + for optim_step in range(total_steps): |
| 102 | + batch = replay_buffer.sample() |
| 103 | + train(batch) # all on cuda:1 |
| 104 | + # Periodic weight sync: training policy -> collector policy |
| 105 | + if optim_step % sync_every == 0: |
| 106 | + weights = TensorDict.from_module(policy) |
| 107 | + collector.update_policy_weights_(weights) |
| 108 | +
|
| 109 | +Key points: |
| 110 | + |
| 111 | +- Both CUDA operations release the GIL, so they truly overlap. |
| 112 | +- Must pass ``TensorDict.from_module(policy)`` to |
| 113 | + ``update_policy_weights_()``, not the module itself. |
| 114 | +- Set ``CUDA_VISIBLE_DEVICES=0,1`` to expose 2 GPUs (IsaacLab defaults to |
| 115 | + only GPU 0). |
| 116 | +- Falls back gracefully to single-GPU if only 1 GPU is available. |
| 117 | + |
| 118 | +RayCollector (alternative) |
| 119 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 120 | + |
| 121 | +If you need distributed collection across multiple GPUs/nodes, use |
| 122 | +:class:`~torchrl.collectors.distributed.RayCollector`: |
| 123 | + |
| 124 | +.. code-block:: python |
| 125 | +
|
| 126 | + from torchrl.collectors.distributed import RayCollector |
| 127 | +
|
| 128 | + collector = RayCollector( |
| 129 | + [make_env] * num_collectors, |
| 130 | + policy, |
| 131 | + frames_per_batch=8192, |
| 132 | + collector_kwargs={ |
| 133 | + "trust_policy": True, |
| 134 | + "no_cuda_sync": True, |
| 135 | + }, |
| 136 | + ) |
| 137 | +
|
| 138 | +Replay Buffer |
| 139 | +------------- |
| 140 | + |
| 141 | +The :class:`~torchrl.data.SliceSampler` needs enough sequential data. With |
| 142 | +``batch_length=50``, you need at least 50 time steps per trajectory before |
| 143 | +sampling:: |
| 144 | + |
| 145 | + init_random_frames >= batch_length * num_envs |
| 146 | + = 50 * 4096 |
| 147 | + = 204,800 |
| 148 | + |
| 149 | +For GPU-resident replay buffers, use |
| 150 | +:class:`~torchrl.data.LazyTensorStorage` with the target CUDA device. |
| 151 | +This avoids CPU→GPU transfer at sample time (but adds it at extend time). |
| 152 | + |
| 153 | +TorchRL-Specific Gotchas |
| 154 | +------------------------ |
| 155 | + |
| 156 | +1. **``no_cuda_sync=True``**: Always set this for collectors with CUDA |
| 157 | + environments. Without it, you get mysterious hangs. |
| 158 | + |
| 159 | +2. **Installing torchrl in Isaac container**: Use |
| 160 | + ``--no-build-isolation --no-deps`` to avoid conflicts with Isaac's |
| 161 | + pre-installed torch/numpy. |
| 162 | + |
| 163 | +3. **``TensorDictPrimer`` ``expand_specs``**: When adding primers (e.g., |
| 164 | + ``state``, ``belief``) to a pre-vectorized env, you MUST pass |
| 165 | + ``expand_specs=True`` to :class:`~torchrl.envs.TensorDictPrimer`. |
| 166 | + Otherwise the primer shapes ``()`` conflict with the env's ``batch_size`` |
| 167 | + ``(4096,)``. |
| 168 | + |
| 169 | +4. **Model-based env spec double-batching**: |
| 170 | + ``model_based_env.set_specs_from_env(batched_env)`` copies specs with batch |
| 171 | + dims baked in. The model-based env then double-batches actions during |
| 172 | + sampling (e.g., ``(4096, 4096, 8)`` instead of ``(4096, 8)``). |
| 173 | + |
| 174 | + **Fix**: unbatch the model-based env's specs after copying: |
| 175 | + |
| 176 | + .. code-block:: python |
| 177 | +
|
| 178 | + model_based_env.set_specs_from_env(test_env) |
| 179 | + if test_env.batch_size: |
| 180 | + idx = (0,) * len(test_env.batch_size) |
| 181 | + model_based_env.__dict__["_output_spec"] = ( |
| 182 | + model_based_env.__dict__["_output_spec"][idx] |
| 183 | + ) |
| 184 | + model_based_env.__dict__["_input_spec"] = ( |
| 185 | + model_based_env.__dict__["_input_spec"][idx] |
| 186 | + ) |
| 187 | + model_based_env.empty_cache() |
| 188 | +
|
| 189 | +5. **``torch.compile`` with TensorDict**: Compiling full loss modules crashes |
| 190 | + because dynamo traces through TensorDict internals. **Fix**: compile |
| 191 | + individual MLP sub-modules (encoder, decoder, reward_model, value_model) |
| 192 | + with ``torch._dynamo.config.suppress_errors = True``. Do NOT compile RSSM |
| 193 | + (sequential, shared with collector) or loss modules (heavy TensorDict use). |
| 194 | + |
| 195 | +6. **``SliceSampler`` with ``strict_length=False``**: The sampler may return |
| 196 | + fewer elements than ``batch_size``. This causes |
| 197 | + ``reshape(-1, batch_length)`` to fail. |
| 198 | + |
| 199 | + **Fix**: truncate the sample: |
| 200 | + |
| 201 | + .. code-block:: python |
| 202 | +
|
| 203 | + sample = replay_buffer.sample() |
| 204 | + numel = sample.numel() |
| 205 | + usable = (numel // batch_length) * batch_length |
| 206 | + if usable < numel: |
| 207 | + sample = sample[:usable] |
| 208 | + sample = sample.reshape(-1, batch_length) |
| 209 | +
|
| 210 | +7. **``frames_per_batch`` vs ``batch_length``**: Each collection adds |
| 211 | + ``frames_per_batch / num_envs`` time steps per env. The |
| 212 | + ``SliceSampler`` needs contiguous sequences of at least ``batch_length`` |
| 213 | + steps within a single trajectory. Ensure |
| 214 | + ``frames_per_batch >= batch_length * num_envs`` for the initial collection, |
| 215 | + or that ``init_random_frames >= batch_length * num_envs``. |
| 216 | + |
| 217 | +8. **``TD_GET_DEFAULTS_TO_NONE``**: Set this environment variable to ``1`` |
| 218 | + when running inside the Isaac container to ensure correct TensorDict |
| 219 | + default behavior. |
0 commit comments