-
Notifications
You must be signed in to change notification settings - Fork 210
Open
Labels
duplicateThis issue or pull request already existsThis issue or pull request already existsenhancementNew feature or requestNew feature or request
Description
Motivation
MaskablePPO is great for large discrete action space that has many invalid actions at each step, while RecurrentPPO is useful for the agent to has a memory of previous observations and actions taken, which improves it's decision making. Right now, we have to choose between those 2 algorithms and cannot have features of both of them, which would greatly improve agents training when both action masking and sequence processing is helpful.
Feature
MaskableRecurrentPPO - An algorithm that is a combination of MaskablePPO and RecurrentPPO. Or action masking integration to PPO and RecurrentPPO.
amirf-cye, rllyryan, H-Park, Naton1, tty666 and 1 more
Metadata
Metadata
Assignees
Labels
duplicateThis issue or pull request already existsThis issue or pull request already existsenhancementNew feature or requestNew feature or request