Add Prioritized Approximation Loss feature #2166

bilelsgh · 2025-08-05T14:00:23Z

Feature overview

Implementation of Prioritized Experience Replay (PER) with Prioritized Approximation Loss (PAL) (linked to #1622).
A NeurIPS 2020 paper shows that using PER is equivalent to adapting the loss function while using uniform experience replay.

The expected gradient of the loss function (1/τ) * |δ(i)|^τ, where τ > 0, when used with PER, is equal to the expected gradient of the following loss under uniform sampling.
https://papers.neurips.cc/paper_files/paper/2020/file/a3bf6e4db673b6449c2f7d13ee6ec9c0-Paper.pdf

This means we can avoid managing a sorted buffer and the associated complexity, while still converging to the same gradient.

Description

I've added a new loss function, which adapts the Huber Loss by incorporating priority as described in the referenced paper. The buffer itself performs uniform sampling (ReplayBuffer). Additionally, I implemented a PrioritizedReplayBuffer to initialize the parameters alpha and beta (following the PAL or PER papers) and to properly handle the case where the PAL Loss is applied within the training method.

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)
In accordance with @AlexPasqua PR Prioritized experience replay #1622 (and the corresponding issue Prioritized Experience Replay for DQN #1242) (👋 @araffin )

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

bilelsgh · 2025-08-05T14:11:09Z

previous xp deleted, see more recents xp in the recent comment

Doctring imrovement

ghost · 2025-10-18T00:12:57Z

Has anyone else been able to get this to work and provide a working example with the code they used? @bilelsgh does not appear to be active anymore. I've been getting a Runtimerror with element 0 does not require or have a grad_fn when using this implementation.

bilelsgh · 2025-10-19T01:19:12Z

Has anyone else been able to get this to work and provide a working example with the code they used? @bilelsgh does not appear to be active anymore. I've been getting a Runtimerror with element 0 does not require or have a grad_fn when using this implementation.

Hi, I’m a bit busy these days, I didn’t have this error when i ran the code
I’ll try again next week and let u know if i have something similar

bilelsgh · 2025-11-27T13:41:15Z

The code is running properly. There is no significant improvement in the reward for CartPole. As detailed in the PER original paper, PER does not always lead to better performance, particularly in environments with low variance in TD-errors and a limited number of rare or informative transitions.

However, the reward is way better on Lunar Lander with PAL, showing its efficiency.
I have planned to evaluate the PAL on atari env soon.

Feel free to evaluate the PR directly, or refer to the experiments presented in the paper used as the basis for this implementation.

bilelsgh · 2025-11-27T13:42:51Z

Here is the code used for the evaluation:

import gymnasium as gym

from stable_baselines3 import DQN
from stable_baselines3.common.buffers import PrioritizedReplayBuffer

env_names = ['CartPole-v1', 'LunarLander-v3']

for env_name in env_names :
    for buffer in [None, PrioritizedReplayBuffer]:
        log_name = f"{env_name}_classic" if not buffer else f"{env_name}_PAL"
        env = gym.make(env_name)

        model = DQN("MlpPolicy",
                    env,
                    replay_buffer_class=buffer,
                    tensorboard_log="./pe_board",
                    verbose=1,)

        model.learn(total_timesteps=100000, log_interval=4, tb_log_name=log_name)

bilelsgh added 2 commits August 5, 2025 15:24

Add Prioritized Approximation loss feature

502d7e0

Add changelog

53a7b2d

bilelsgh added 4 commits August 5, 2025 16:53

Update buffers.py

2fa2d87

Doctring imrovement

Add alpha and beta parameters

dc99748

Merge branch 'master' into per_loss

447d01f

alpha and beta attributes in PrioritizedReplayBuffer

d7c1bfa

bilelsgh mentioned this pull request Aug 5, 2025

Prioritized experience replay #1622

Open

16 tasks

Merge branch 'master' into per_loss

12e1993

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Prioritized Approximation Loss feature #2166

Add Prioritized Approximation Loss feature #2166

bilelsgh commented Aug 5, 2025 •

edited

Loading

Uh oh!

bilelsgh commented Aug 5, 2025 •

edited

Loading

Uh oh!

ghost commented Oct 18, 2025

Uh oh!

bilelsgh commented Oct 19, 2025

Uh oh!

bilelsgh commented Nov 27, 2025

Uh oh!

bilelsgh commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Prioritized Approximation Loss feature #2166

Are you sure you want to change the base?

Add Prioritized Approximation Loss feature #2166

Conversation

bilelsgh commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature overview

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

bilelsgh commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Oct 18, 2025

Uh oh!

bilelsgh commented Oct 19, 2025

Uh oh!

bilelsgh commented Nov 27, 2025

Uh oh!

bilelsgh commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bilelsgh commented Aug 5, 2025 •

edited

Loading

bilelsgh commented Aug 5, 2025 •

edited

Loading