The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks.
scripts/- Training scriptssrc/- Source code implementationdmc2gym/- DeepMind Control Suite to Gymnasium adapter
Note: The current codebase does not fully replicate paper accuracies for MinAtar/Seaquest-v1 and MinAtar/Asterix-v1; a slightly different script was used to produce those results. We're working on unifying them.
Create and activate the conda environment:
conda create -n arq python=3.10
conda activate arq
pip install poetry
poetry installFor DeepMind Control Suite tasks, install xvfb:
sudo apt-get install xvfb # Ubuntu/DebianMinAtar/Breakout-v1MinAtar/Freeway-v1MinAtar/SpaceInvaders-v1MinAtar/Seaquest-v1MinAtar/Asterix-v1
walker(walker walk)runner(walker run)hopper(hopper hop)cheetah(cheetah run)reacher_hard(reacher hard)
For MinAtar tasks:
poetry run python scripts/train.py <ENV_ID> --seed=<SEED>For DMC tasks:
xvfb-run python scripts/train.py <ENV_ID> --seed=<SEED># MinAtar
poetry run python scripts/train.py MinAtar/Freeway-v1 --seed=42
# DMC
xvfb-run python scripts/train.py walker --seed=42