Skip to content

AhmedMagdyHendawy/MINTO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Official Implementation of MINTO ๐ŸŒฟ, introduced in "Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning" [ICLR 2026].

custom_badge custom_badge custom_badge

TL;DR

๐ŸŒฟ MINTO is a simple, yet effective target bootstrapping method for temporal-difference RL that enables faster, more stable learning and consistently improves performance across algorithms and benchmarks.

๐ŸŒฟ MINTO computes the target value by considering the MINimum estimate between the Target and Online network, hence introducing fresh ๐ŸŒฟ and more recent value estimates in a stable manner ๐Ÿ›ก๏ธ by mitigating the potential overestimation bias of using the online network for bootstrapping.


Code Structure

MINTO integrates easily into value-based and actor-critic methods with minimal overhead. Hence, we evaluate it across diverse benchmarks, spanning online and offline RL, as well as discrete and continuous action spaces. To conduct our experiments, we utilized variants of three different repositories:

  1. Online RL (discrete): Based on slimDQN.
  2. Offline RL: Based on slimCQL.
  3. Online RL (continuous): Based on SimbaV2.

To reproduce the main results in the paper, see the corresponding subfolders and their installation guides.

Subfolders:

  1. online_rl_discrete/ for online RL (Atari, discrete).
  2. offline_rl/ for offline RL (Atari, discrete).
  3. online_rl_continuous/ for continuous control (e.g., MuJoCo).

Quick Start

Example (online RL and Discrete Control):

cd online_rl_discrete
conda create -n minto python=3.10
conda activate minto
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]
bash run_dqn.sh min Breakout

Citation

If you use this codebase or find our work helpful, please consider citing our paper as follows:

@inproceedings{hendawy2025use,
  title={Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning},
  author={Hendawy, Ahmed and Metternich, Henrik and Vincent, Th{\'e}o and Kallel, Mahdi and Peters, Jan and D'Eramo, Carlo},
  journal={International Conference on Learning Representations (ICLR)},
  year={2026}
}

About

๐ŸŒฟ [ICLR 2026] Official codebase for MINTO. ๐ŸŒฟ MINTO is a simple, yet effective target bootstrapping method for temporal-difference RL that enables faster, more stable learning and consistently improves performance across algorithms and benchmarks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors