[Ready to merge] Add Parallel Q Network (PQN) cleanrl example and single-discrete action support by Ivan-267 · Pull Request #227 · edbeeching/godot_rl_agents

Ivan-267 · 2025-03-05T17:14:42Z

Adds basic Parallel Q network (PQN) support, based on the CleanRL script with changes like those applied to the PPO example.

https://docs.cleanrl.dev/rl-algorithms/pqn/

PQN is a parallelized version of the Deep Q-learning algorithm. It is designed to be more efficient than DQN by using multiple agents to interact with the environment in parallel. PQN can be thought of as DQN (1) without replay buffer and target networks, and (2) with layer normalizations and parallel environments.

This is well suited for GDRL as we usually use multiple parallel agents.

Modifies our action preprocessor to support a single discrete action (this part needs more testing).

The algorithm will support a single obs space and single action space for now.

TODO:

Fix failing tests
Add most of the standard arguments (for start only timesteps, env path, n_parallel, speedup and viz)
Test SB3 to see it's not affected (automated tests should cover basic SF/Rllib tests).

TODO (optional, might also be in a future PR)

Add rewards being reported as in PPO script
Add onnx export

Ivan-267 · 2025-03-05T17:24:33Z

PQN.mp4

Quick test - realtime video, training in editor.

Params used: should mostly be the defaults from CleanRL except the timesteps:

    # Algorithm specific arguments
    env_id: str = "CartPole-v1"
    """the id of the environment"""
    total_timesteps: int = 1_000_000
    """total timesteps of the experiments"""
    learning_rate: float = 2.5e-4
    """the learning rate of the optimizer"""
    num_envs: int = 4
    """the number of parallel game environments"""
    num_steps: int = 128
    """the number of steps to run for each environment per update"""
    num_minibatches: int = 4
    """the number of mini-batches"""
    update_epochs: int = 4
    """the K epochs to update the policy"""
    anneal_lr: bool = True
    """Toggle learning rate annealing"""
    gamma: float = 0.99
    """the discount factor gamma"""
    start_e: float = 1
    """the starting epsilon for exploration"""
    end_e: float = 0.05
    """the ending epsilon for exploration"""
    exploration_fraction: float = 0.5
    """the fraction of `total_timesteps` it takes from start_e to end_e"""
    max_grad_norm: float = 10.0
    """the maximum norm for the gradient clipping"""
    q_lambda: float = 0.65
    """the lambda for Q(lambda)"""

The env was modified to use discrete actions but might also have other modifications as it's a local test version.

Ivan-267 · 2025-03-06T14:26:35Z

Added onnx export support (agent can move in 4 directions only, can sometimes get stuck but this can happen with other algorithms too so it's not PQN related - something like frame-stack, recurrence or other techniques might help there).

onnx.mp4

The exported onnx shown in Netron:

… to onnx verification.

Ivan-267 · 2025-03-06T16:42:15Z

I've done some quick tests now with SB3 discrete, multi-discrete and continuous versions of BallChase. I briefly started training on Sample factory (WSL) as well.

Notes: pip install tyro will be needed as it's used by the CleanRL script to manage arguments. If we have some tutorial/doc for this algo in the future we can mention it.

Ivan-267 · 2025-03-11T19:36:17Z

Another test with n_parallel = 4 and:

    """the id of the environment"""
    total_timesteps: int = 5_000_000
    """total timesteps of the experiments"""
    learning_rate: float = 2.5e-4
    """the learning rate of the optimizer [note: automatically set]"""
    num_envs: int = 4
    """the number of parallel game environments"""
    num_steps: int = 32
    """the number of steps to run for each environment per update"""
    num_minibatches: int = 1
    """the number of mini-batches"""
    update_epochs: int = 32
    """the K epochs to update the policy"""
    anneal_lr: bool = False
    """Toggle learning rate annealing"""
    gamma: float = 0.99
    """the discount factor gamma"""
    start_e: float = 1.0
    """the starting epsilon for exploration"""
    end_e: float = 0.05
    """the ending epsilon for exploration"""
    exploration_fraction: float = 0.75
    """the fraction of `total_timesteps` it takes from start_e to end_e"""
    max_grad_norm: float = 10.0
    """the maximum norm for the gradient clipping"""
    q_lambda: float = 0.65
    """the lambda for Q(lambda)"""

The test env is not released, but it is a modification of the completed version of tutorial env with discrete action, frame stacking 8 for raycast and vector to goal, and one hot encoded previous action (not stacked) included in the obs.

RandomizedPositions.mp4

edbeeching

LGTM

Add PQN and single-discrete action support

1cb4388

Ivan-267 added 8 commits March 5, 2025 18:36

Update utils.py

12f070d

Update clean_rl_pqn_example.py

f15e145

Update utils.py

2f23b52

Update stable_baselines3_hp_tuning.py

3834f0b

Update clean_rl_pqn_example.py

6f33b43

Update utils.py

70e2c0d

Update clean_rl_pqn_example.py

40d3770

Adds PQN onnx export

912169d

Ivan-267 added 3 commits March 6, 2025 15:47

Adds rewards and epsilon explanation to console stats

05d8ce7

no message

9b8d7df

Adds Discrete space exception (when a single discrete action is used)…

3f31b59

… to onnx verification.

Ivan-267 changed the title ~~[WIP] Add Parallel Q Network (PQN) cleanrl example and single-discrete action support~~ Add Parallel Q Network (PQN) cleanrl example and single-discrete action support Mar 6, 2025

Ivan-267 requested a review from edbeeching March 6, 2025 16:42

Ivan-267 changed the title ~~Add Parallel Q Network (PQN) cleanrl example and single-discrete action support~~ [Ready to merge] Add Parallel Q Network (PQN) cleanrl example and single-discrete action support Mar 6, 2025

Updates clean_rl_pqn_example.py

a0bfa10

edbeeching approved these changes Mar 13, 2025

View reviewed changes

Ivan-267 merged commit 12ab0d4 into main Mar 14, 2025
13 checks passed

Ivan-267 deleted the AddPQNSupport branch March 14, 2025 04:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ready to merge] Add Parallel Q Network (PQN) cleanrl example and single-discrete action support#227

[Ready to merge] Add Parallel Q Network (PQN) cleanrl example and single-discrete action support#227
Ivan-267 merged 13 commits intomainfrom
AddPQNSupport

Ivan-267 commented Mar 5, 2025 •

edited

Loading

Uh oh!

Ivan-267 commented Mar 5, 2025 •

edited

Loading

Uh oh!

Ivan-267 commented Mar 6, 2025

Uh oh!

Ivan-267 commented Mar 6, 2025

Uh oh!

Ivan-267 commented Mar 11, 2025 •

edited

Loading

Uh oh!

edbeeching left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ivan-267 commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivan-267 commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivan-267 commented Mar 6, 2025

Uh oh!

Ivan-267 commented Mar 6, 2025

Uh oh!

Ivan-267 commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edbeeching left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ivan-267 commented Mar 5, 2025 •

edited

Loading

Ivan-267 commented Mar 5, 2025 •

edited

Loading

Ivan-267 commented Mar 11, 2025 •

edited

Loading