Several issues regarding termination and truncation of signal processing #427

Wangzai-hub · 2026-04-21T04:39:37Z

Wangzai-hub
Apr 21, 2026

When _time_limit_bootstrap is set to True, the processing logic is similar to：

# compute values
with torch.autocast(device_type=self._device_type, enabled=self._mixed_precision):
    values, _, _ = self.value.act({"states": self._state_preprocessor(states)}, role="value")
    values = self._value_preprocessor(values, inverse=True)

# time-limit (truncation) bootstrapping
if self._time_limit_bootstrap:
    rewards += self._discount_factor * values * truncated

The code adds the current state value. Should it instead add the state value of the next moment?

If _time_limit_bootstrap is set to True, when terminated = False and truncated = True, in version 2.0, does GAE's calculation add next_values again?

advantage = 0
advantages = torch.zeros_like(rewards)
not_terminated = terminated.logical_not()
memory_size = rewards.shape[0]

# advantages computation
for i in reversed(range(memory_size)):
    next_values = values[i + 1] if i < memory_size - 1 else next_values
    advantage = (
        rewards[i]
        - values[i]
        + discount_factor * not_terminated[i] * (next_values + lambda_coefficient * advantage)
    )
    advantages[i] = advantage

Should we pass ‘dones=self.memory.get_tensor_by_name("terminated") | self.memory.get_tensor_by_name("truncated")‘ here as we did in version 1.4.3?
Meanwhile, in some environment implementations, when the environment ends (terminated or truncated = True), such as in isaaclab (see the code at the bottom), the reset function is called, and then the state after reset is returned, not the state after the action is executed. Can passing in ’done = self.memory.get_tensor_by_name("terminated") | self.memory.get_tensor_by_name("truncated")‘ avoid some logical issues?

returns, advantages = compute_gae(
    rewards=self.memory.get_tensor_by_name("rewards"),
    terminated=self.memory.get_tensor_by_name("terminated"),
    values=values,
    next_values=last_values,
    discount_factor=self.cfg.discount_factor,
    lambda_coefficient=self.cfg.gae_lambda,
)

self.reset_terminated[:], self.reset_time_outs[:] = self._get_dones()
self.reset_buf = self.reset_terminated | self.reset_time_outs
self.reward_buf = self._get_rewards()

# -- reset envs that terminated/timed-out and log the episode information
reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
if len(reset_env_ids) > 0:
    self._reset_idx(reset_env_ids)
    # if sensors are added to the scene, make sure we render to reflect changes in reset
    if self.sim.has_rtx_sensors() and self.cfg.num_rerenders_on_reset > 0:
        for _ in range(self.cfg.num_rerenders_on_reset):
            self.sim.render()

# post-step: step interval event
if self.cfg.events:
    if "interval" in self.event_manager.available_modes:
        self.event_manager.apply(mode="interval", dt=self.step_dt)

# update observations
self.obs_buf = self._get_observations()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several issues regarding termination and truncation of signal processing #427

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Several issues regarding termination and truncation of signal processing #427

Uh oh!

Wangzai-hub Apr 21, 2026

Replies: 0 comments

Wangzai-hub
Apr 21, 2026