[Ready to merge] Add Parallel Q Network (PQN) cleanrl example and single-discrete action support#227
[Ready to merge] Add Parallel Q Network (PQN) cleanrl example and single-discrete action support#227
Conversation
PQN.mp4Quick test - realtime video, training in editor. Params used: should mostly be the defaults from CleanRL except the timesteps: # Algorithm specific arguments
env_id: str = "CartPole-v1"
"""the id of the environment"""
total_timesteps: int = 1_000_000
"""total timesteps of the experiments"""
learning_rate: float = 2.5e-4
"""the learning rate of the optimizer"""
num_envs: int = 4
"""the number of parallel game environments"""
num_steps: int = 128
"""the number of steps to run for each environment per update"""
num_minibatches: int = 4
"""the number of mini-batches"""
update_epochs: int = 4
"""the K epochs to update the policy"""
anneal_lr: bool = True
"""Toggle learning rate annealing"""
gamma: float = 0.99
"""the discount factor gamma"""
start_e: float = 1
"""the starting epsilon for exploration"""
end_e: float = 0.05
"""the ending epsilon for exploration"""
exploration_fraction: float = 0.5
"""the fraction of `total_timesteps` it takes from start_e to end_e"""
max_grad_norm: float = 10.0
"""the maximum norm for the gradient clipping"""
q_lambda: float = 0.65
"""the lambda for Q(lambda)"""The env was modified to use discrete actions but might also have other modifications as it's a local test version. |
|
I've done some quick tests now with SB3 discrete, multi-discrete and continuous versions of BallChase. I briefly started training on Sample factory (WSL) as well. Notes: |
|
Another test with The test env is not released, but it is a modification of the completed version of tutorial env with discrete action, frame stacking 8 for raycast and vector to goal, and one hot encoded previous action (not stacked) included in the obs. RandomizedPositions.mp4 |

Adds basic Parallel Q network (PQN) support, based on the CleanRL script with changes like those applied to the PPO example.
https://docs.cleanrl.dev/rl-algorithms/pqn/
This is well suited for GDRL as we usually use multiple parallel agents.
Modifies our action preprocessor to support a single discrete action (this part needs more testing).
The algorithm will support a single obs space and single action space for now.
TODO:
TODO (optional, might also be in a future PR)