InnateCoder: Learning Programmatic Options with Foundation Models
This repository contains the code and data for the paper "InnateCoder: Learning Programmatic Options with Foundation Models" by Rubens O. Moraes, Quazi Sadmine, Hendrik Baier, and Levi Lelis, accepted on IJCAI 2025.
InnateCoder is a system that uses foundation models to give reinforcement learning agents "innate skills" before they start interacting with their environment. Unlike traditional reinforcement learning approaches where agents start from scratch, InnateCoder provides programmatic options (reusable skills) that make learning more efficient.
InnateCoder has three main components:
-
Learning Options: Uses a foundation model to generate programmatic policies, which are broken down into smaller "options" (reusable skills).
-
Semantic Space: Creates a meaningful space of programs where similar behaviors are grouped together.
-
Local Search: Searches through both syntax and semantic spaces to find optimal policies.
- More sample-efficient than systems without pre-learned options
- Leverages human knowledge encoded in foundation models
- Generates options in a zero-shot manner (before environment interaction)
- Cost-effective as it only uses the foundation model as a pre-processing step
- Accessible to smaller labs and companies
InnateCoder represents policies as programs written in domain-specific languages (DSLs). Each program receives a state and returns an action for the agent to take. The system was tested in two domains:
- MicroRTS: A challenging real-time strategy game
- Karel the Robot: A benchmark for program synthesis and reinforcement learning
Experiments show that InnateCoder is more sample-efficient than versions without options or with options learned from experience. The policies learned are competitive with and often outperform state-of-the-art algorithms.
The complete implementation is available at: https://github.com/rubensolv/InnateCoder
InnateCoder improves on traditional search methods by exploring the "semantic space" of programs rather than just the "syntax space." This means it focuses on program behavior rather than just structure, allowing for more effective search and better results.
Karel is a simple programming environment for learning the basics of programming. It provides a set of commands for controlling a robot in a grid world. We used as a benchmark for our system to evaluate the performance of the programmatic options generated by InnateCoder. The main repository used as a base is the Karel repository can be found at: https://github.com/lelis-research/prog_policies.
In the datasets/dictionaries
folder, you can find at least one example of a dictionary generated with Large Language Models (LLMs) for each map used in the experiments. These dictionaries contain programmatic options that serve as the foundation for InnateCoder's approach, demonstrating how foundation models can generate useful program fragments without any prior interaction with the environment. Each dictionary includes a diverse set of programs that capture different behaviors and strategies relevant to solving tasks in the Karel domain.
The stochastic_hill_climbing_LLM_dict.py
file implements a specialized search algorithm that combines stochastic hill climbing with a dictionary of programs generated by a Large Language Model (LLM). This approach represents a core component of InnateCoder's implementation for the Karel domain.
Key features of this implementation:
-
Dictionary-Based Mutation: Instead of relying solely on random mutations, the algorithm leverages a dictionary of program fragments generated by an LLM. This allows the search to make more semantically meaningful changes to programs.
-
Behavior-Based Dictionary Cleaning: The implementation includes a mechanism to clean the dictionary by removing programs that produce duplicate behaviors, ensuring diversity in the available options.
-
Adaptive Search Strategy: The algorithm balances between exploiting the LLM-generated dictionary (80% of mutations) and traditional random mutations (20%), providing both guided and exploratory search capabilities.
-
Local Maximum Escape: When stuck in a local maximum, the algorithm restarts with a completely new random program, allowing it to explore different regions of the program space.
This implementation demonstrates how InnateCoder effectively combines foundation model knowledge (through the LLM-generated dictionary) with traditional search techniques to find optimal programmatic policies more efficiently.
- Python 3.8 or higher
To install the required dependencies, run:
pip install -r requirements.txt
For exact version requirements, please refer to the requirements.txt file.
You can use the scripts in the scripts
folder to run various experiments:
-
Evaluate LLM-generated Solutions:
python scripts/run_LLM_solutions_eval.py
This script evaluates the performance of solutions directly generated by LLMs.
-
Generate Solution Visualizations:
python scripts/run_LLM_for_gif.py
Creates animated GIFs showing how a particular solution executes in the Karel environment.
-
Run Search Experiments:
python scripts/run_search.py + parameters # or python scripts/run_search_new.py + parameters
These scripts perform search-based experiments using parameters defined in their respective classes. Be sure to use
StochasticHillClimbingLLMDict
as the search algorithm parameter to leverage the LLM-generated dictionary of programmatic options.
For all scripts, you can modify the configuration parameters directly in the files to customize the experiments according to your needs.
MicroRTS is a research platform for real-time strategy (RTS) games. It is designed to facilitate the development and evaluation of AI algorithms in a competitive gaming environment.