Wandering Light

A Python library for tool call mastery through self play.

Given a set of functions (tool calls), we generate input output examples from these and then train an LLM to predict the correct list of functions that can map the inputs to the outputs (AKA Induction task). The library also supports training the LLM for generating appropriately challenging input outputs (AKA proposal task).

Features

Scripts for SFT and RL on Induction and Proposal tasks using the TRL library.
Synthetic data generation using LLMs.
Wandb integration for monitoring and analyzing the metrics.
Evaluation scripts, website to visualize the evaluation metrics.
A web interface to visualize and interact with the data samples.
Clean code: 300+ unit tests, CI using Github actions.
Can train small models (0.1B) locally within a few hours.

Motivation

Currently LLMs are trained to imitate text on the internet. As a result they do not know what they know and do not know, which causes hallucination. They lack a world model (See Sutton's age of experience). Instead of next token prediction we would like a training approach that lets models take action and learn from its outcome. We call this self supervised tool-use learning (SSTL). We can consider function calling or tool use as taking an action. We also want a task that is infinitely scalable for learning, bound only by computation. Programming by Example (PBE) provides such an environment, where the model is tasked to find a series of functions that transforms a given inputs to outputs.

We would like to develop some meta cognition capabilities in the model. Given any task the model should be able to classify it into one of these 3 possibilities.

Confidently say it can solve this
Confidently say it cannot solve this
Unsure

This lets it keep learning boundlessly in the future by exploring the unsure tasks. A value function in the RL paradigm should help with this.

The tasks

Let us assume each state represents a list of values (e.g. the integers [1,2,3]). Using the naming convention of Absolute Zero Reasoner

flowchart LR
    A["[1, 2, 3]"] -->|inc| B["[2, 3, 4]"]
    A -->|double| C["[2, 4, 6]"]
    A -->|neg| D["[-1, -2, -3]"]
    C -->|inc| E["[3, 5, 7]"]
    B -->|double| I["[4,6,8]"]
    A -->|int_to_str| F["['1', '2', '3']"]
    F -->|repeat| G["['11', '22', '33']"]
    D -->|neg| A
    E -->|neg| H["[-3, -5, -7]"]

Induction We would like to train a solver model that can give us the shortest path between two states ([1,2,3] -> [3,5,7]) A path is defined as a DAG consisting of pure functions (e.g. double, plus1). Currently the functions take only one input and output only a single output for simplicity. In the future we would like to expand to multiple argument functions to make this library more practical.

Propose Propose a new task for the solver, that is not too easy and not too hard.

Results

Initial Results in WandB

Installation

This project requires python 3.12 or later.

git clone https://github.com/abhishekraok/wandering-light.git
cd wandering-light
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Testing

Run the full test suite with:

pytest

Code

FunctionDef: Class for immutable functions e.g. double.
TrajectorySpec: Represents planned sequences of functions without execution e.g. [double, plus1].
Trajectory: Class representing triplet of inputs, outputs and function sequences e.g. [(1,2,3), (3,5,7), (double, plus1)].
Execute trajectories and evaluate results with Executor class.
Several solvers included:
- RandomSolve: tries random function sequences within a budget.
- BFSSolve: performs breadth-first search up to a maximum depth.
- LLM solvers: use OpenAI, Gemini, Ollama or local models to propose functions.

Project Structure

.
├── wandering_light/        # Main package
│   ├── __init__.py         # Package initialization
│   ├── executor.py         # Executes individual functions and trajectories
│   ├── function_def.py     # Defines the FunctionDef class
│   ├── common_functions.py # Built-in helper functions
│   ├── trajectory.py       # Defines TrajectorySpec and Trajectory classes
│   ├── solver.py           # Implements search-based solvers
│   ├── synthesize.py       # Utilities for synthesizing new functions
│   ├── typed_list.py       # Typed list container
│   ├── llm_utils.py        # LLM integration utilities
│   ├── constants.py        # Project constants
│   ├── evals/              # Evaluation scripts and workflows
│   ├── training/           # Training scripts (SFT, RL)
│   └── web_ui/             # Web interface
├── tests/                  # pytest test cases for core functionality
├── pyproject.toml          # Project configuration and dependencies
├── pytest.ini              # pytest configuration
└── README.md               # This file

Usage

Solving for an input output pair

You can search for a trajectory that maps a specific input to an output using one of the built-in solvers.

from wandering_light.solver import get_solver_by_name

solver = get_solver_by_name("bfs", budget=3)
trajectory = solver.solve(input_list=[1, 2, 3], output_list=[3, 5, 7])
print("Found trajectory:", trajectory)

Evaluation

Create an evaluation file

python wandering_light/evals/create_data.py --save

Run using an LLM API

python wandering_light/evals/run_evaluation.py --eval_file=evals/data/eval_data_v20250831_160239.py --solver_names=["gemini"] --num_samples 100 --budget 1

See solver.py for additional solver names (openai, ollama, local models, etc.).

Training SFT

Quick check, without online evaluation

python wandering_light/training/sft.py --no-eval

To train on the full dataset, use

python wandering_light/training/sft.py --full-run

Suppose you store the output dir to SFT_OUTPUT_DIR, evaluate to ensure you have a decent success rate (between 30-70% ideally). You can use the evaluation command below.

Next you can do RL.

Training RL

python wandering_light/training/rl_grpo.py --batch-size 32 --model-name $SFT_OUTPUT_DIR --full-run --wandb-run-name $NAME

Evaluate local trained LLM

Assuming you have a checkpoint shown below.

python wandering_light/evals/run_evaluation.py --budget=1 --eval_file=wandering_light/evals/data/random_inputs_500.py  --solver_names=[trained_local] --budget 1 --model-name abhishekraok/induction-basicfns-opt125m-longsft

Evaluation Dashboard

To see the results of all the past evaluations run

streamlit run wandering_light/evals/dashboard.py

Proposer

The data generator. First finetune it using SFT, using the --task proposer flag. Then evaluate it.

Evaluate proposer

 python wandering_light/evals/evaluate_proposer.py --model abhishekraok/proposer-basicfns-opt125m-sft2k --solver-model abhishekraok/induction-basicfns-opt125m-longsft

which should output

EvalResult(parse_rate=0.96, avg_function_count=2.02, avg_function_count_ratio=1.38, solver_success_rate=0.15, num_samples=100, frac_non_zero_std=0.31)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
tests		tests
wandering_light		wandering_light
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lint.sh		lint.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
run_e2e_tests.sh		run_e2e_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wandering Light

Features

Motivation

The tasks

Results

Installation

Testing

Code

Project Structure

Usage

Solving for an input output pair

Evaluation

Training SFT

Training RL

Evaluate local trained LLM

Evaluation Dashboard

Proposer

Evaluate proposer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wandering Light

Features

Motivation

The tasks

Results

Installation

Testing

Code

Project Structure

Usage

Solving for an input output pair

Evaluation

Training SFT

Training RL

Evaluate local trained LLM

Evaluation Dashboard

Proposer

Evaluate proposer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages