Official Implementation of `MINTO` 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

TL;DR

🌿 MINTO is a simple, yet effective target bootstrapping method for temporal-difference RL that enables faster, more stable learning and consistently improves performance across algorithms and benchmarks.

🌿 MINTO computes the target value by considering the MINimum estimate between the Target and Online network, hence introducing fresh 🌿 and more recent value estimates in a stable manner 🛡️ by mitigating the potential overestimation bias of using the online network for bootstrapping.

Code Structure

MINTO integrates easily into value-based and actor-critic methods with minimal overhead. Hence, we evaluate it across diverse benchmarks, spanning online and offline RL, as well as discrete and continuous action spaces. To conduct our experiments, we utilized variants of three different repositories:

Online RL (discrete): Based on slimDQN.
Offline RL: Based on slimCQL.
Online RL (continuous): Based on SimbaV2.

To reproduce the main results in the paper, see the corresponding subfolders and their installation guides.

Subfolders:

online_rl_discrete/ for online RL (Atari, discrete).
offline_rl/ for offline RL (Atari, discrete).
online_rl_continuous/ for continuous control (e.g., MuJoCo).

Quick Start

Example (online RL and Discrete Control):

cd online_rl_discrete
conda create -n minto python=3.10
conda activate minto
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]
bash run_dqn.sh min Breakout

Citation

If you use this codebase or find our work helpful, please consider citing our paper as follows:

@inproceedings{hendawy2025use,
  title={Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning},
  author={Hendawy, Ahmed and Metternich, Henrik and Vincent, Th{\'e}o and Kallel, Mahdi and Peters, Jan and D'Eramo, Carlo},
  journal={International Conference on Learning Representations (ICLR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
img		img
offline_rl		offline_rl
online_rl_continuous		online_rl_continuous
online_rl_discrete		online_rl_discrete
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official Implementation of `MINTO` 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

TL;DR

Code Structure

Quick Start

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Official Implementation of MINTO 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

TL;DR

Code Structure

Quick Start

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Official Implementation of `MINTO` 🌿, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

Packages