Fix Numerical issues in MaskPPO Training #302

Sarthak-Dayal · 2025-07-20T17:16:12Z

Description

High-level description: Improve numerical stability by mirroring the logic used by TorchRL, which uses -inf instead of a small number and handles masking better.

Details of changes:

Use -inf instead of -1e6 as logits when we want to mask
In __init__, store the original "pristine" logits, then recompute masked logits each time using the original pristine copy in apply_masking
Remove manual self.probs fiddling since this is handled internally
Re-write entropy to ensure that there are no weird NaN issues, mirror logic from torchRL that clamps invalid logits with min_real (the minimum possible value of the dtype of the logits), re-normalizes, and then computes entropy only over valid entries.

Context

closes #81

I have raised an issue to propose this change (required)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

Sarthak-Dayal and others added 4 commits July 20, 2025 01:42

Refactor maskable categoricals and improve logit updates

e536f79

Update changelog with bug fix

d698af5

Fix formatting and linting issues

fc8601f

Merge branch 'master' into master

712db3e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Numerical issues in MaskPPO Training #302

Fix Numerical issues in MaskPPO Training #302

Uh oh!

Sarthak-Dayal commented Jul 20, 2025

Uh oh!

Uh oh!

Fix Numerical issues in MaskPPO Training #302

Are you sure you want to change the base?

Fix Numerical issues in MaskPPO Training #302

Uh oh!

Conversation

Sarthak-Dayal commented Jul 20, 2025

Description

Context

Types of changes

Checklist:

Uh oh!

Uh oh!