A JAX/Flax implementation of Parameter Space Noise for Exploration (Plappert et al., 2017) built on top of CleanRL's DDPG algorithm.
Instead of adding noise to actions (as in standard DDPG), parameter space noise perturbs the policy network's weights directly, producing more consistent and state-dependent exploration.
This project provides two DDPG implementations for continuous control tasks:
- Standard DDPG (
ddpg_continuous_action_jax.py) — baseline with action-space exploration noise - DDPG + Parameter Space Noise (
ddpg_continuous_action_param_noise_jax.py) — exploration via adaptive parameter perturbation
- Adaptive noise scaling that maintains a target action-space distance between perturbed and unperturbed policies
- LayerNorm in the parameter-noise variant (excluded from perturbation)
- Experiment tracking with Weights & Biases and TensorBoard
parameter_space_noise/
├── ddpg_continuous_action_jax.py # Standard DDPG baseline
├── ddpg_continuous_action_param_noise_jax.py # DDPG with parameter space noise
└── parameter_noise_jax.py # Noise adaptation and perturbation utilities
Requires Python 3.10+.
poetry installYou'll also need JAX installed with your preferred backend (CPU/GPU). See JAX installation.
Run the parameter-noise variant:
python parameter_space_noise/ddpg_continuous_action_param_noise_jax.py --env-id HalfCheetah-v4Run the standard DDPG baseline:
python parameter_space_noise/ddpg_continuous_action_jax.py --env-id Hopper-v4All hyperparameters are configurable via CLI flags (powered by tyro). Use --help to see available options.
- Perturb — At the start of each episode, Gaussian noise scaled by
param_stdis added to the actor's parameters (excluding LayerNorm layers) - Collect — The perturbed actor interacts with the environment, collecting transitions
- Adapt — Every
adaptation_frequencysteps, the distance between perturbed and unperturbed actions is measured. If the distance exceedstarget_action_std, noise is decreased; otherwise it is increased - Train — Standard DDPG updates are applied to the actor and critic
- Plappert, M., et al. "Parameter Space Noise for Exploration." arXiv:1706.01905, 2017.
- Huang, S., et al. "CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms." JMLR, 2022.
MIT