Exploration in Reinforcement Learning

Exploration is critical in RL to discover rewarding states and actions. Without effective exploration, agents may get stuck in local optima or fail to find rewards in sparse-reward environments.

Key Concepts

Exploration vs Exploitation

The fundamental trade-off:

Exploitation: Use current knowledge to maximize reward
Exploration: Try new actions to gain information

Exploration Strategies

Random Exploration:

Epsilon-greedy: Random actions with probability ε
Boltzmann: Sample from softmax over Q-values
Action noise: Add Gaussian noise to continuous actions

Intrinsic Motivation:

Curiosity-driven: Bonus for novel/surprising states
Count-based: Explore less-visited states
Prediction error: Bonus for states the agent can't predict well

Structured Exploration:

Go-Explore: Remember and return to promising states
Novelty search: Ignore rewards, maximize behavioral diversity
Quality diversity: Maintain archive of diverse high-performing behaviors

Algorithms Covered

ICM: Intrinsic Curiosity Module using prediction error
RND: Random Network Distillation for exploration bonus
Go-Explore: Archive-based exploration with robustification (NOT YET IMPLEMENTED)

Comparison

Algorithm	Type	Intrinsic Reward	Best For	Overhead
ICM	Prediction error	Forward model error	Continuous/discrete	Moderate
RND	Prediction error	Random target error	Hard exploration	Low
Go-Explore	Archive-based	N/A (deterministic return)	Extremely sparse rewards	High

When to Use Exploration Methods

ICM:

Environments with moderate sparsity
When dynamics are learnable
Avoid "noisy TV" problem (unpredictable distractors)

RND:

Very sparse rewards (Montezuma's Revenge)
Deterministic environments
Need simple, scalable method

Go-Explore:

Extremely hard exploration (Pitfall, Montezuma's Revenge)
Deterministic or resettable environments
Can afford memory overhead
Need guaranteed exploration of promising areas

Common Exploration Challenges

Sparse Rewards: Reward is rare, exploration critical
Deceptive Rewards: Local optima mislead agent
High-Dimensional State Space: Curse of dimensionality
Partial Observability: Can't distinguish novel from seen
Stochastic Dynamics: Hard to predict, noisyintrinsic signals

References

ICM: Pathak et al., "Curiosity-driven Exploration by Self-Supervised Prediction", ICML 2017
RND: Burda et al., "Exploration by Random Network Distillation", ICLR 2019
Go-Explore: Ecoffet et al., "Go-Explore: a New Approach for Hard-Exploration Problems", Nature 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploration in Reinforcement Learning

Key Concepts

Exploration vs Exploitation

Exploration Strategies

Algorithms Covered

Comparison

When to Use Exploration Methods

Common Exploration Challenges

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Exploration in Reinforcement Learning

Key Concepts

Exploration vs Exploitation

Exploration Strategies

Algorithms Covered

Comparison

When to Use Exploration Methods

Common Exploration Challenges

References