Skip to content

Conversation

Sarthak-Dayal
Copy link

Description

High-level description: Improve numerical stability by mirroring the logic used by TorchRL, which uses -inf instead of a small number and handles masking better.

Details of changes:

  1. Use -inf instead of -1e6 as logits when we want to mask
  2. In __init__, store the original "pristine" logits, then recompute masked logits each time using the original pristine copy in apply_masking
  3. Remove manual self.probs fiddling since this is handled internally
  4. Re-write entropy to ensure that there are no weird NaN issues, mirror logic from torchRL that clamps invalid logits with min_real (the minimum possible value of the dtype of the logits), re-normalizes, and then computes entropy only over valid entries.

Context

closes #81

  • I have raised an issue to propose this change (required)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • The functionality/performance matches that of the source (required for new training algorithms or training-related features).
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have included an example of using the feature (required for new features).
  • I have included baseline results (required for new training algorithms or training-related features).
  • I have updated the documentation accordingly.
  • I have updated the changelog accordingly (required).
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] An error in MaskPPO training
2 participants