|
1 | | -# OpenEvolve |
| 1 | +# Evolving Symbolic Regression with OpenEvolve on LLM-SRBench 🧬🔍 |
2 | 2 |
|
3 | | -An open-source implementation of the AlphaEvolve system described in the Google DeepMind paper "AlphaEvolve: A coding agent for scientific and algorithmic discovery" (2025). |
| 3 | +This example demonstrates how **OpenEvolve** can be utilized to perform **symbolic regression** tasks using the **LLM-SRBench benchmark** (highlighted at ICML 2025). It showcases OpenEvolve's capability to evolve Python code, transforming simple mathematical expressions into more complex and accurate models that fit given datasets. |
4 | 4 |
|
5 | | - |
| 5 | +------ |
6 | 6 |
|
7 | | -## Overview |
| 7 | +## 🎯 Problem Description: Symbolic Regression on LLM-SRBench |
8 | 8 |
|
9 | | -OpenEvolve is an evolutionary coding agent that uses Large Language Models to optimize code through an iterative process. It orchestrates a pipeline of LLM-based code generation, evaluation, and selection to continuously improve programs for a variety of tasks. |
| 9 | +**Symbolic Regression** is the task of discovering a mathematical expression that best fits a given dataset. Unlike traditional regression techniques that optimize parameters for a predefined model structure, symbolic regression aims to find both the **structure of the model** and its **parameters**. |
10 | 10 |
|
11 | | -Key features: |
12 | | -- Evolution of entire code files, not just single functions |
13 | | -- Support for multiple programming languages |
14 | | -- Supports OpenAI-compatible APIs for any LLM |
15 | | -- Multi-objective optimization |
16 | | -- Flexible prompt engineering |
17 | | -- Distributed evaluation |
| 11 | +This example leverages **LLM-SRBench**, a benchmark specifically designed for Large Language Model-based Symbolic Regression. The core objective is to use OpenEvolve to evolve an initial, often simple, model (e.g., a linear model) into a more sophisticated symbolic expression. This evolved expression should accurately capture the underlying relationships within various scientific datasets provided by the benchmark. |
18 | 12 |
|
19 | | -## How It Works |
| 13 | +------ |
20 | 14 |
|
21 | | -OpenEvolve follows an evolutionary approach with the following components: |
| 15 | +## 🚀 Getting Started |
22 | 16 |
|
23 | | - |
| 17 | +Follow these steps to set up and run the symbolic regression benchmark example: |
24 | 18 |
|
25 | | -1. **Prompt Sampler**: Creates context-rich prompts containing past programs, their scores, and problem descriptions |
26 | | -2. **LLM Ensemble**: Generates code modifications via an ensemble of language models |
27 | | -3. **Evaluator Pool**: Tests generated programs and assigns scores |
28 | | -4. **Program Database**: Stores programs and their evaluation metrics, guiding future evolution |
| 19 | +### 1. Configure API Secrets |
29 | 20 |
|
30 | | -The controller orchestrates interactions between these components in an asynchronous pipeline, maximizing throughput to evaluate as many candidate solutions as possible. |
| 21 | +You'll need to provide your API credentials for the language models used by OpenEvolve. |
31 | 22 |
|
32 | | -## Getting Started |
| 23 | +- Create a `secrets.yaml` file in the example directory. |
| 24 | +- Add your API key and model preferences: |
33 | 25 |
|
34 | | -### Installation |
| 26 | +YAML |
35 | 27 |
|
36 | | -To install natively, use: |
37 | | -```bash |
38 | | -git clone https://github.com/codelion/openevolve.git |
39 | | -cd openevolve |
40 | | -pip install -e . |
41 | 28 | ``` |
42 | | - |
43 | | -### Quick Start |
44 | | - |
45 | | -```python |
46 | | -from openevolve import OpenEvolve |
47 | | - |
48 | | -# Initialize the system |
49 | | -evolve = OpenEvolve( |
50 | | - initial_program_path="path/to/initial_program.py", |
51 | | - evaluation_file="path/to/evaluator.py", |
52 | | - config_path="path/to/config.yaml" |
53 | | -) |
54 | | - |
55 | | -# Run the evolution |
56 | | -best_program = await evolve.run(iterations=1000) |
57 | | -print(f"Best program metrics:") |
58 | | -for name, value in best_program.metrics.items(): |
59 | | - print(f" {name}: {value:.4f}") |
| 29 | +# secrets.yaml |
| 30 | +api_key: <YOUR_OPENAI_API_KEY> |
| 31 | +api_base: "https://api.openai.com/v1" # Or your custom endpoint |
| 32 | +primary_model: "gpt-4o" |
| 33 | +secondary_model: "o3" # Or another preferred model for specific tasks |
60 | 34 | ``` |
61 | 35 |
|
62 | | -### Command-Line Usage |
| 36 | +Replace `<YOUR_OPENAI_API_KEY>` with your actual OpenAI API key. |
63 | 37 |
|
64 | | -OpenEvolve can also be run from the command line: |
| 38 | +### 2. Load Benchmark Tasks & Generate Initial Programs |
65 | 39 |
|
66 | | -```bash |
67 | | -python openevolve-run.py path/to/initial_program.py path/to/evaluator.py --config path/to/config.yaml --iterations 1000 |
68 | | -``` |
| 40 | +The `data_api.py` script is crucial for setting up the environment. It prepares tasks from the LLM-SRBench dataset (defined by classes in `./bench`, and will be located at `./problems`). |
| 41 | + |
| 42 | +For each benchmark task, this script will automatically generate: |
69 | 43 |
|
70 | | -### Resuming from Checkpoints |
| 44 | +- `initial_program.py`: A starting Python program, typically a simple linear model. |
| 45 | +- `evaluator.py`: A tailored evaluation script for the task. |
| 46 | +- `config.yaml`: An OpenEvolve configuration file specific to the task. |
71 | 47 |
|
72 | | -OpenEvolve automatically saves checkpoints at intervals specified by the `checkpoint_interval` config parameter (default is 10 iterations). You can resume an evolution run from a saved checkpoint: |
| 48 | +Run the script from your terminal: |
73 | 49 |
|
74 | 50 | ```bash |
75 | | -python openevolve-run.py path/to/initial_program.py path/to/evaluator.py \ |
76 | | - --config path/to/config.yaml \ |
77 | | - --checkpoint path/to/checkpoint_directory \ |
78 | | - --iterations 50 |
| 51 | +python data_api.py |
79 | 52 | ``` |
80 | 53 |
|
81 | | -When resuming from a checkpoint: |
82 | | -- The system loads all previously evolved programs and their metrics |
83 | | -- Checkpoint numbering continues from where it left off (e.g., if loaded from checkpoint_50, the next checkpoint will be checkpoint_60) |
84 | | -- All evolution state is preserved (best programs, feature maps, archives, etc.) |
85 | | -- Each checkpoint directory contains a copy of the best program at that point in time |
| 54 | +This will create subdirectories for each benchmark task, populated with the necessary files. |
| 55 | + |
| 56 | +### 3. Run OpenEvolve |
86 | 57 |
|
87 | | -Example workflow with checkpoints: |
| 58 | +Use the provided shell script `scripts.sh` to execute OpenEvolve across the generated benchmark tasks. This script iterates through the task-specific configurations and applies the evolutionary process. |
88 | 59 |
|
89 | 60 | ```bash |
90 | | -# Run for 50 iterations (creates checkpoints at iterations 10, 20, 30, 40, 50) |
91 | | -python openevolve-run.py examples/function_minimization/initial_program.py \ |
92 | | - examples/function_minimization/evaluator.py \ |
93 | | - --iterations 50 |
94 | | - |
95 | | -# Resume from checkpoint 50 for another 50 iterations (creates checkpoints at 60, 70, 80, 90, 100) |
96 | | -python openevolve-run.py examples/function_minimization/initial_program.py \ |
97 | | - examples/function_minimization/evaluator.py \ |
98 | | - --checkpoint examples/function_minimization/openevolve_output/checkpoints/checkpoint_50 \ |
99 | | - --iterations 50 |
| 61 | +bash scripts.sh |
100 | 62 | ``` |
101 | 63 |
|
102 | | -### Comparing Results Across Checkpoints |
| 64 | +### 4. Evaluate Results |
103 | 65 |
|
104 | | -Each checkpoint directory contains the best program found up to that point, making it easy to compare solutions over time: |
| 66 | +After OpenEvolve has completed its runs, you can evaluate the performance on different subsets of tasks (e.g., bio, chemical, physics, material). The `eval.py` script collates the results and provides a summary. |
105 | 67 |
|
106 | | -``` |
107 | | -checkpoints/ |
108 | | - checkpoint_10/ |
109 | | - best_program.py # Best program at iteration 10 |
110 | | - best_program_info.json # Metrics and details |
111 | | - programs/ # All programs evaluated so far |
112 | | - metadata.json # Database state |
113 | | - checkpoint_20/ |
114 | | - best_program.py # Best program at iteration 20 |
115 | | - ... |
| 68 | +```bash |
| 69 | +python eval.py <subset_path> |
116 | 70 | ``` |
117 | 71 |
|
118 | | -You can compare the evolution of solutions by examining the best programs at different checkpoints: |
| 72 | +For example, to evaluate results for the 'physics' subset located in `./problems/phys_osc/`, you would run: |
119 | 73 |
|
120 | 74 | ```bash |
121 | | -# Compare best programs at different checkpoints |
122 | | -diff -u checkpoints/checkpoint_10/best_program.py checkpoints/checkpoint_20/best_program.py |
123 | | - |
124 | | -# Compare metrics |
125 | | -cat checkpoints/checkpoint_*/best_program_info.json | grep -A 10 metrics |
| 75 | +python eval.py ./problems/phys_osc |
126 | 76 | ``` |
127 | | -### Docker |
128 | 77 |
|
129 | | -You can also install and execute via Docker: |
130 | | -```bash |
131 | | -docker build -t openevolve . |
132 | | -docker run --rm -v .:/app openevolve examples/function_minimization/initial_program.py examples/function_minimization/evaluator.py --config examples/function_minimization/config.yaml --iterations 1000 |
133 | | -``` |
| 78 | +This script will also save a `JSON` file containing detailed results for your analysis. |
134 | 79 |
|
135 | | -## Configuration |
| 80 | +------ |
136 | 81 |
|
137 | | -OpenEvolve is highly configurable. You can specify configuration options in a YAML file: |
| 82 | +## 🌱 Algorithm Evolution: From Linear Model to Complex Expression |
138 | 83 |
|
139 | | -```yaml |
140 | | -# Example configuration |
141 | | -max_iterations: 1000 |
142 | | -llm: |
143 | | - primary_model: "gemini-2.0-flash-lite" |
144 | | - secondary_model: "gemini-2.0-flash" |
145 | | - temperature: 0.7 |
146 | | -database: |
147 | | - population_size: 500 |
148 | | - num_islands: 5 |
149 | | -``` |
| 84 | +OpenEvolve works by iteratively modifying an initial Python program to find a better-fitting mathematical expression. |
150 | 85 |
|
151 | | -Sample configuration files are available in the `configs/` directory: |
152 | | -- `default_config.yaml`: Comprehensive configuration with all available options |
| 86 | +### Initial Algorithm (Example: Linear Model) |
153 | 87 |
|
154 | | -See the [Configuration Guide](configs/default_config.yaml) for a full list of options. |
| 88 | +The `data_api.py` script typically generates a basic linear model as the starting point. For a given task, this `initial_program.py` might look like this: |
155 | 89 |
|
156 | | -## Examples |
| 90 | +```python |
| 91 | +""" |
| 92 | +Initial program: A naive linear model for symbolic regression. |
| 93 | +This model predicts the output as a linear combination of input variables |
| 94 | +or a constant if no input variables are present. |
| 95 | +The function is designed for vectorized input (X matrix). |
| 96 | +
|
| 97 | +Target output variable: dv_dt (Acceleration in Nonl-linear Harmonic Oscillator) |
| 98 | +Input variables (columns of x): x (Position at time t), t (Time), v (Velocity at time t) |
| 99 | +""" |
| 100 | +import numpy as np |
| 101 | + |
| 102 | +# Input variable mapping for x (columns of the input matrix): |
| 103 | +# x[:, 0]: x (Position at time t) |
| 104 | +# x[:, 1]: t (Time) |
| 105 | +# x[:, 2]: v (Velocity at time t) |
| 106 | + |
| 107 | +# Parameters will be optimized by BFGS outside this function. |
| 108 | +# Number of parameters expected by this model: 10. |
| 109 | +# Example initialization: params = np.random.rand(10) |
| 110 | + |
| 111 | +# EVOLVE-BLOCK-START |
| 112 | + |
| 113 | +def func(x, params): |
| 114 | + """ |
| 115 | + Calculates the model output using a linear combination of input variables |
| 116 | + or a constant value if no input variables. Operates on a matrix of samples. |
| 117 | +
|
| 118 | + Args: |
| 119 | + x (np.ndarray): A 2D numpy array of input variable values, shape (n_samples, n_features). |
| 120 | + n_features is 3. |
| 121 | + If n_features is 0, x should be shape (n_samples, 0). |
| 122 | + The order of columns in x must correspond to: |
| 123 | + (x, t, v). |
| 124 | + params (np.ndarray): A 1D numpy array of parameters. |
| 125 | + Expected length: 10. |
| 126 | +
|
| 127 | + Returns: |
| 128 | + np.ndarray: A 1D numpy array of predicted output values, shape (n_samples,). |
| 129 | + """ |
| 130 | + |
| 131 | + result = x[:, 0] * params[0] + x[:, 1] * params[1] + x[:, 2] * params[2] |
| 132 | + return result |
| 133 | + |
| 134 | +# EVOLVE-BLOCK-END |
| 135 | + |
| 136 | +# This part remains fixed (not evolved) |
| 137 | +# It ensures that OpenEvolve can consistently call the evolving function. |
| 138 | +def run_search(): |
| 139 | + return func |
| 140 | + |
| 141 | +# Note: The actual structure of initial_program.py is determined by data_api.py. |
| 142 | +``` |
157 | 143 |
|
158 | | -See the `examples/` directory for complete examples of using OpenEvolve on various problems: |
| 144 | +### Evolved Algorithm (Discovered Symbolic Expression) |
159 | 145 |
|
160 | | -### Circle Packing |
| 146 | +OpenEvolve will iteratively modify the Python code within the `# EVOLVE-BLOCK-START` and `# EVOLVE-BLOCK-END` markers in `initial_program.py`. The goal is to transform the simple initial model into a more complex and accurate symbolic expression that minimizes the Mean Squared Error (MSE) on the training data. |
161 | 147 |
|
162 | | -Our implementation of the circle packing problem from the AlphaEvolve paper. For the n=26 case, where one needs to pack 26 circles in a unit square we also obtain SOTA results. |
| 148 | +An evolved `func` might, for instance, discover a non-linear expression like: |
163 | 149 |
|
164 | | -[Explore the Circle Packing Example](examples/circle_packing/) |
| 150 | +```python |
| 151 | +# Hypothetical example of what OpenEvolve might find: |
| 152 | +def func(x, params): |
| 153 | + # Assuming X_train_scaled maps to x and const maps to a parameter in params |
| 154 | + predictions = np.sin(x[:, 0]) * x[:, 1]**2 + params[0] |
| 155 | + return predictions |
| 156 | +``` |
165 | 157 |
|
166 | | -We have sucessfully replicated the results from the AlphaEvolve paper, below is the packing found by OpenEvolve after 800 iterations |
| 158 | +*(This is a simplified, hypothetical example to illustrate the transformation.)* |
167 | 159 |
|
168 | | - |
| 160 | +------ |
169 | 161 |
|
170 | | -This is exactly the packing reported by AlphaEolve in their paper (Figure 14): |
| 162 | +## ⚙️ Key Configuration & Approach |
171 | 163 |
|
172 | | - |
| 164 | +- LLM Models: |
| 165 | + - **Primary Model:** `gpt-4o` (or your configured `primary_model`) is typically used for sophisticated code generation and modification. |
| 166 | + - **Secondary Model:** `o3` (or your configured `secondary_model`) can be used for refinements, simpler modifications, or other auxiliary tasks within the evolutionary process. |
| 167 | +- Evaluation Strategy: |
| 168 | + - Currently, this example employs a direct evaluation strategy (not **cascade evaluation**). |
| 169 | +- Objective Function: |
| 170 | + - The primary objective is to **minimize the Mean Squared Error (MSE)** between the model's predictions and the true values on the training data. |
173 | 171 |
|
174 | | -### Function Minimization |
| 172 | +------ |
175 | 173 |
|
176 | | -An example showing how OpenEvolve can transform a simple random search algorithm into a sophisticated simulated annealing approach. |
| 174 | +## 📊 Results |
177 | 175 |
|
178 | | -[Explore the Function Minimization Example](examples/function_minimization/) |
| 176 | +The `eval.py` script will help you collect and analyze performance metrics. The LLM-SRBench paper provides a comprehensive comparison of various baselines. For results generated by this specific OpenEvolve example, you should run the evaluation script as described in the "Getting Started" section. |
179 | 177 |
|
180 | | -## Preparing Your Own Problems |
| 178 | +For benchmark-wide comparisons and results from other methods, please refer to the official LLM-SRBench paper. |
181 | 179 |
|
182 | | -To use OpenEvolve for your own problems: |
| 180 | +| **Task Category** | Med. NMSE (Test) | Med. R2 (Test) | **Med. NMSE (OOD Test)** | **Med. R2 (OOD Test)** | |
| 181 | +| ----------------------- | ---------------- | -------------- | ------------------------ | ---------------------- | |
| 182 | +| Chemistry (36 tasks) | 2.3419e-06 | 1.000 | 3.1384e-02 | 0.9686 | |
| 183 | +| Biology (24 tasks) | | | | | |
| 184 | +| Physics (44 tasks) | 1.8548e-05 | 1.000 | 7.9255e-04 | 0.9992 | |
| 185 | +| Material Sc. (25 tasks) | | | | | |
183 | 186 |
|
184 | | -1. **Mark code sections** to evolve with `# EVOLVE-BLOCK-START` and `# EVOLVE-BLOCK-END` comments |
185 | | -2. **Create an evaluation function** that returns a dictionary of metrics |
186 | | -3. **Configure OpenEvolve** with appropriate parameters |
187 | | -4. **Run the evolution** process |
| 187 | +------ |
188 | 188 |
|
189 | | -## Citation |
| 189 | +## 🤝 Contribution |
190 | 190 |
|
191 | | -If you use OpenEvolve in your research, please cite: |
| 191 | +This OpenEvolve example for LLM-SRBench was implemented by [**Haowei Lin**](https://linhaowei1.github.io/) from Peking University. If you encounter any issues or have questions, please feel free to reach out to Haowei via email ([email protected]) for discussion. |
192 | 192 |
|
193 | | -``` |
194 | | -@software{openevolve, |
195 | | - title = {OpenEvolve: Open-source implementation of AlphaEvolve}, |
196 | | - author = {Asankhaya Sharma}, |
197 | | - year = {2025}, |
198 | | - publisher = {GitHub}, |
199 | | - url = {https://github.com/codelion/openevolve} |
200 | | -} |
201 | | -``` |
0 commit comments