Difference between linear distance reward and reward.tolerance in LEAP-like environments #251

Kiwi-2025 · 2025-12-25T09:37:13Z

Kiwi-2025
Dec 25, 2025

I am an undergraduate student attempting to replicate the LEAP training environment from mujoco_playground to implement a dexterous hand RL task. I have been stuck on this for a month, so any insights would be greatly appreciated!

1. The Problem:
Initially, I designed a simple “reaching” reward defined as the negative of the distance to the target (R=−d), without using the reward.tolerance utility.
However, as training progressed, the agent learned to move away from the target. Strangely, the reported reward values were increasing during this process.

2. The Fix:
After mimicking the LEAP environment implementation, I switched to using the reward.tolerance function (which seems to create a bounded/capped reward). With this change, the agent behaved correctly and learned to reach the target successfully.

3. My Question:
Does using reward.tolerance fundamentally change the gradient flow or the optimization landscape compared to a raw linear negative distance? I suspect this might be related to how gradients are calculated or the handling of unbounded values, but I have hit a wall trying to understand the root cause.

The following is my Python code, palm_dist is always less than 0.5
reward_reach_palm = reward.tolerance(palm_dist, bounds=(-0.05, 0.0), margin=0.5, sigmoid='linear', value_at_margin=0.0) reward_reach_palm = -palm_dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between linear distance reward and reward.tolerance in LEAP-like environments #251

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Difference between linear distance reward and reward.tolerance in LEAP-like environments #251

Uh oh!

Kiwi-2025 Dec 25, 2025

Replies: 0 comments

Kiwi-2025
Dec 25, 2025