Parameter Space Noise for Exploration

A JAX/Flax implementation of Parameter Space Noise for Exploration (Plappert et al., 2017) built on top of CleanRL's DDPG algorithm.

Instead of adding noise to actions (as in standard DDPG), parameter space noise perturbs the policy network's weights directly, producing more consistent and state-dependent exploration.

Overview

This project provides two DDPG implementations for continuous control tasks:

Standard DDPG (ddpg_continuous_action_jax.py) — baseline with action-space exploration noise
DDPG + Parameter Space Noise (ddpg_continuous_action_param_noise_jax.py) — exploration via adaptive parameter perturbation

Key features

Adaptive noise scaling that maintains a target action-space distance between perturbed and unperturbed policies
LayerNorm in the parameter-noise variant (excluded from perturbation)
Experiment tracking with Weights & Biases and TensorBoard

Project Structure

parameter_space_noise/
├── ddpg_continuous_action_jax.py               # Standard DDPG baseline
├── ddpg_continuous_action_param_noise_jax.py   # DDPG with parameter space noise
└── parameter_noise_jax.py                      # Noise adaptation and perturbation utilities

Installation

Requires Python 3.10+.

poetry install

You'll also need JAX installed with your preferred backend (CPU/GPU). See JAX installation.

Usage

Run the parameter-noise variant:

python parameter_space_noise/ddpg_continuous_action_param_noise_jax.py --env-id HalfCheetah-v4

Run the standard DDPG baseline:

python parameter_space_noise/ddpg_continuous_action_jax.py --env-id Hopper-v4

All hyperparameters are configurable via CLI flags (powered by tyro). Use --help to see available options.

How It Works

Perturb — At the start of each episode, Gaussian noise scaled by param_std is added to the actor's parameters (excluding LayerNorm layers)
Collect — The perturbed actor interacts with the environment, collecting transitions
Adapt — Every adaptation_frequency steps, the distance between perturbed and unperturbed actions is measured. If the distance exceeds target_action_std, noise is decreased; otherwise it is increased
Train — Standard DDPG updates are applied to the actor and critic

References

Plappert, M., et al. "Parameter Space Noise for Exploration." arXiv:1706.01905, 2017.
Huang, S., et al. "CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms." JMLR, 2022.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
parameter_space_noise		parameter_space_noise
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parameter Space Noise for Exploration

Overview

Key features

Project Structure

Installation

Usage

How It Works

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parameter Space Noise for Exploration

Overview

Key features

Project Structure

Installation

Usage

How It Works

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages