Skip to content

Commit 9c19181

Browse files
authored
Merge pull request #17 from linhaowei1/main
Add Example: Symbolic Regression Benchmark
2 parents a6c2117 + c4ccdae commit 9c19181

File tree

8 files changed

+1832
-3
lines changed

8 files changed

+1832
-3
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,7 @@ htmlcov/
4545
# Misc
4646
.DS_Store
4747
.venv
48+
49+
# For SR
50+
secrets.yaml
51+
problems
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# Evolving Symbolic Regression with OpenEvolve on LLM-SRBench 🧬🔍
2+
3+
This example demonstrates how **OpenEvolve** can be utilized to perform **symbolic regression** tasks using the **[LLM-SRBench benchmark](https://arxiv.org/pdf/2504.10415)**. It showcases OpenEvolve's capability to evolve Python code, transforming simple mathematical expressions into more complex and accurate models that fit given datasets.
4+
5+
------
6+
7+
## 🎯 Problem Description: Symbolic Regression on LLM-SRBench
8+
9+
**Symbolic Regression** is the task of discovering a mathematical expression that best fits a given dataset. Unlike traditional regression techniques that optimize parameters for a predefined model structure, symbolic regression aims to find both the **structure of the model** and its **parameters**.
10+
11+
This example leverages **LLM-SRBench**, a benchmark specifically designed for Large Language Model-based Symbolic Regression. The core objective is to use OpenEvolve to evolve an initial, often simple, model (e.g., a linear model) into a more sophisticated symbolic expression. This evolved expression should accurately capture the underlying relationships within various scientific datasets provided by the benchmark.
12+
13+
------
14+
15+
## 🚀 Getting Started
16+
17+
Follow these steps to set up and run the symbolic regression benchmark example:
18+
19+
### 1. Configure API Keys
20+
21+
The API key is read from the environment `OPENAI_API_KEY` by default. The primary and secondary model we used in testing LLM-SRBench is `gpt-4o` and `o3`. You can check `create_config()` in `data_api.py`.
22+
23+
24+
### 2. Load Benchmark Tasks & Generate Initial Programs
25+
26+
The `data_api.py` script is crucial for setting up the environment. It prepares tasks from the LLM-SRBench dataset (defined by classes in `./bench`, and will be located at `./problems`).
27+
28+
For each benchmark task, this script will automatically generate:
29+
30+
- `initial_program.py`: A starting Python program, typically a simple linear model.
31+
- `evaluator.py`: A tailored evaluation script for the task.
32+
- `config.yaml`: An OpenEvolve configuration file specific to the task.
33+
34+
Run the script from your terminal:
35+
36+
```bash
37+
python data_api.py
38+
```
39+
40+
This will create subdirectories for each benchmark task, populated with the necessary files.
41+
42+
### 3. Run OpenEvolve
43+
44+
Use the provided shell script `scripts.sh` to execute OpenEvolve across the generated benchmark tasks. This script iterates through the task-specific configurations and applies the evolutionary process.
45+
46+
```bash
47+
bash scripts.sh
48+
```
49+
50+
### 4. Evaluate Results
51+
52+
After OpenEvolve has completed its runs, you can evaluate the performance on different subsets of tasks (e.g., bio, chemical, physics, material). The `eval.py` script collates the results and provides a summary.
53+
54+
```bash
55+
python eval.py <subset_path>
56+
```
57+
58+
For example, to evaluate results for the 'physics' subset located in `./problems/phys_osc/`, you would run:
59+
60+
```bash
61+
python eval.py ./problems/phys_osc
62+
```
63+
64+
This script will also save a `JSON` file containing detailed results for your analysis.
65+
66+
------
67+
68+
## 🌱 Algorithm Evolution: From Linear Model to Complex Expression
69+
70+
OpenEvolve works by iteratively modifying an initial Python program to find a better-fitting mathematical expression.
71+
72+
### Initial Algorithm (Example: Linear Model)
73+
74+
The `data_api.py` script typically generates a basic linear model as the starting point. For a given task, this `initial_program.py` might look like this:
75+
76+
```python
77+
"""
78+
Initial program: A naive linear model for symbolic regression.
79+
This model predicts the output as a linear combination of input variables
80+
or a constant if no input variables are present.
81+
The function is designed for vectorized input (X matrix).
82+
83+
Target output variable: dv_dt (Acceleration in Nonl-linear Harmonic Oscillator)
84+
Input variables (columns of x): x (Position at time t), t (Time), v (Velocity at time t)
85+
"""
86+
import numpy as np
87+
88+
# Input variable mapping for x (columns of the input matrix):
89+
# x[:, 0]: x (Position at time t)
90+
# x[:, 1]: t (Time)
91+
# x[:, 2]: v (Velocity at time t)
92+
93+
# Parameters will be optimized by BFGS outside this function.
94+
# Number of parameters expected by this model: 10.
95+
# Example initialization: params = np.random.rand(10)
96+
97+
# EVOLVE-BLOCK-START
98+
99+
def func(x, params):
100+
"""
101+
Calculates the model output using a linear combination of input variables
102+
or a constant value if no input variables. Operates on a matrix of samples.
103+
104+
Args:
105+
x (np.ndarray): A 2D numpy array of input variable values, shape (n_samples, n_features).
106+
n_features is 3.
107+
If n_features is 0, x should be shape (n_samples, 0).
108+
The order of columns in x must correspond to:
109+
(x, t, v).
110+
params (np.ndarray): A 1D numpy array of parameters.
111+
Expected length: 10.
112+
113+
Returns:
114+
np.ndarray: A 1D numpy array of predicted output values, shape (n_samples,).
115+
"""
116+
117+
result = x[:, 0] * params[0] + x[:, 1] * params[1] + x[:, 2] * params[2]
118+
return result
119+
120+
# EVOLVE-BLOCK-END
121+
122+
# This part remains fixed (not evolved)
123+
# It ensures that OpenEvolve can consistently call the evolving function.
124+
def run_search():
125+
return func
126+
127+
# Note: The actual structure of initial_program.py is determined by data_api.py.
128+
```
129+
130+
### Evolved Algorithm (Discovered Symbolic Expression)
131+
132+
**OpenEvolve** iteratively modifies Python code segments, delineated by `# EVOLVE-BLOCK-START` and `# EVOLVE-BLOCK-END` markers within an `initial_program.py` file. The primary objective is to evolve a simple initial model into a more complex and accurate symbolic expression that minimizes the Mean Squared Error (MSE) against the training data.
133+
134+
Below is a symbolic expression discovered by OpenEvolve for the physics task `PO10`:
135+
136+
```python
137+
import numpy as np
138+
139+
def func(x, params):
140+
"""
141+
Calculates the model output using a linear combination of input variables
142+
or a constant value if no input variables. Operates on a matrix of samples.
143+
144+
Args:
145+
x (np.ndarray): A 2D numpy array of input variable values, shape (n_samples, n_features).
146+
n_features is 2.
147+
If n_features is 0, x should be shape (n_samples, 0).
148+
The order of columns in x must correspond to:
149+
(x, t).
150+
params (np.ndarray): A 1D numpy array of parameters.
151+
Expected length: 10.
152+
153+
Returns:
154+
np.ndarray: A 1D numpy array of predicted output values, shape (n_samples,).
155+
"""
156+
# --------------------------------------------------------------------------
157+
# Allow for flexible parameter count, only padding essential parts.
158+
if len(params) < 10:
159+
required_params = params.shape[0]
160+
params = np.pad(params, (0, 10 - required_params))
161+
162+
# Readable aliases for the two input features
163+
pos = x[:, 0] # position x(t)
164+
t_val = x[:, 1] # time t
165+
166+
# ---------- Internal restoring forces (Duffing-like) ------------------
167+
# −k x −β x³ −γ x⁵ (only odd powers, respecting the usual symmetry)
168+
# Reduced polynomial order (up to cubic) to avoid over-fitting while
169+
# still capturing the essential softening/stiffening behaviour.
170+
restoring = -(params[0] * pos + params[1] * pos**3)
171+
172+
# ---------- Externally forced, periodically driven term --------------
173+
# A e^{-λ t} sin(ω t) + B cos(Ω t) (General form considered)
174+
# Let the optimiser decide whether the envelope should grow
175+
# or decay by keeping the sign of params[4]. The exponent is
176+
# clipped to avoid numerical overflow.
177+
# Simple periodic forcing without exponential envelope. This is
178+
# sufficient for many driven oscillator benchmarks and reduces the
179+
# risk of numerical overflow in exp().
180+
trig1 = params[3] * t_val
181+
trig2 = params[5] * t_val
182+
forcing = params[2] * np.cos(trig1) + params[4] * np.sin(trig2)
183+
184+
# ---------- Weak position–time coupling & constant bias ---------------
185+
interaction = params[8] * pos * t_val
186+
bias = params[9]
187+
188+
return restoring + forcing + interaction + bias
189+
```
190+
191+
The ground truth for this PO10 task is represented by the equation:
192+
193+
$F_0sin(t)−ω_0^2(γt+1)x(t)−ω_0^2x(t)^3−ω_0^2x(t).$
194+
195+
This can be expanded and simplified to:
196+
197+
$F_0sin(t)−ω_0^2γtx(t)−2ω_0^2x(t)−ω_0^2x(t)^3.$
198+
199+
Notably, the core functional forms present in this ground truth equation are captured by the evolved symbolic expression:
200+
201+
- The $sin(t)$ component can be represented by `params[4] * np.sin(params[5] * t_val)`.
202+
- The linear $x(t)$ term corresponds to `params[0] * pos`.
203+
- The cubic $x(t)^3$ term is `params[1] * pos**3`.
204+
- The interaction term $t⋅x(t)$ is captured by `params[8] * pos * t_val`.
205+
206+
The evolved code also includes terms like `params[2] * np.cos(params[3] * t_val)` (a cosine forcing term) and `params[9]` (a constant bias). These might evolve to have negligible parameter values if not supported by the data, or they could capture secondary effects or noise. The inclusion of the primary terms demonstrates OpenEvolve's strength in identifying the correct underlying structure of the equation.
207+
208+
*Note: Symbolic regression, despite such promising results, remains a very challenging task. This difficulty largely stems from the inherent complexities of inferring precise mathematical models from finite and potentially noisy training data, which provides only a partial observation of the true underlying system.*
209+
210+
------
211+
212+
## ⚙️ Key Configuration & Approach
213+
214+
- LLM Models:
215+
- **Primary Model:** `gpt-4o` (or your configured `primary_model`) is typically used for sophisticated code generation and modification.
216+
- **Secondary Model:** `o3` (or your configured `secondary_model`) can be used for refinements, simpler modifications, or other auxiliary tasks within the evolutionary process.
217+
- Evaluation Strategy:
218+
- Currently, this example employs a direct evaluation strategy (not **cascade evaluation**).
219+
- Objective Function:
220+
- The primary objective is to **minimize the Mean Squared Error (MSE)** between the model's predictions and the true values on the training data.
221+
222+
------
223+
224+
## 📊 Results
225+
226+
The `eval.py` script will help you collect and analyze performance metrics. The LLM-SRBench paper provides a comprehensive comparison of various baselines. For results generated by this specific OpenEvolve example, you should run the evaluation script as described in the "Getting Started" section.
227+
228+
For benchmark-wide comparisons and results from other methods, please refer to the official LLM-SRBench paper.
229+
230+
*Note: Below we extract the approximate results of baselines in Fig.5 from LLMSR-Bench paper.*
231+
232+
**Median NMSE (Test Set)**
233+
234+
| **Domain** | **Direct** | **LLMSR** | **LaSR** | **SGA** | **OpenEvolve** |
235+
| ---------------- | ----------- | --------------- | ----------- | ----------- | -------------- |
236+
| Chemistry | ~6.0 × 10⁻¹ | **~1.5 × 10⁻⁶** | ~1.0 × 10⁻⁴ | ~1.0 × 10⁻² | 2.34 × 10⁻⁶ |
237+
| Biology | ~2.0 × 10⁻² | ~1.0 × 10⁻⁵ | ~1.0 × 10⁻⁴ | ~2.0 × 10⁻⁴ ||
238+
| Physics | ~3.0 × 10⁻¹ | **~2.0 × 10⁻⁷** | ~1.0 × 10⁻³ | ~4.0 × 10⁻³ | 1.85 × 10⁻⁵ |
239+
| Material Science | ~3.0 × 10⁻¹ | ~1.0 × 10⁻⁴ | ~7.0 × 10⁻⁴ | ~3.0 × 10⁻² ||
240+
241+
**Median NMSE (OOD Test Set)**
242+
243+
| **Domain** | **Direct** | **LLMSR** | **LaSR** | **SGA** | **OpenEvolve** |
244+
| ---------------- | ---------- | ----------- | ----------- | ---------- | --------------- |
245+
| Chemistry | ~3.0 × 10² | ~5.0 × 10⁻² | ~1.0 × 10⁰ | ~1.5 × 10⁰ | **3.14 × 10⁻²** |
246+
| Biology | ~1.2 × 10² | ~4.0 × 10⁰ | ~3.0 × 10¹ | ~4.0 × 10¹ ||
247+
| Physics | ~1.0 × 10¹ | ~1.0 × 10⁻³ | ~5.0 × 10⁻² | ~1.0 × 10⁰ | **7.93 × 10⁻⁴** |
248+
| Material Science | ~2.5 × 10¹ | ~3.0 × 10⁰ | ~8.0 × 10⁰ | ~2.5 × 10¹ ||
249+
250+
Current results for OpenEvolve are only for two subsets of LSR-Synth. We will update the comprehensive results soon.
251+
252+
253+
------
254+
255+
## 🤝 Contribution
256+
257+
This OpenEvolve example for LLM-SRBench was implemented by [**Haowei Lin**](https://linhaowei1.github.io/) from Peking University. If you encounter any issues or have questions, please feel free to reach out to Haowei via email ([email protected]) for discussion.
258+
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
from typing import Optional, Any
2+
from dataclasses import dataclass
3+
import sympy
4+
5+
6+
@dataclass
7+
class Equation:
8+
symbols: list
9+
symbol_descs: list
10+
symbol_properties: list
11+
expression: str
12+
desc: Optional[str] = None
13+
14+
sympy_format: Optional[sympy.Expr] = None
15+
lambda_format: Optional[callable] = None
16+
program_format: Optional[str] = None
17+
18+
19+
@dataclass
20+
class SearchResult:
21+
equation: Equation
22+
aux: Any
23+
24+
25+
@dataclass
26+
class SEDTask:
27+
name: str
28+
symbols: list
29+
symbol_descs: list
30+
symbol_properties: list
31+
samples: Any
32+
desc: Optional[str] = None
33+
34+
35+
@dataclass
36+
class Problem:
37+
dataset_identifier: str
38+
equation_idx: str
39+
gt_equation: Equation
40+
samples: Any
41+
42+
def create_task(self) -> SEDTask:
43+
return SEDTask(
44+
name=self.equation_idx,
45+
symbols=self.gt_equation.symbols,
46+
symbol_descs=self.gt_equation.symbol_descs,
47+
symbol_properties=self.gt_equation.symbol_properties,
48+
samples=self.train_samples,
49+
desc=self.gt_equation.desc,
50+
)
51+
52+
@property
53+
def train_samples(self):
54+
return self.samples["train"]
55+
56+
@property
57+
def test_samples(self):
58+
return self.samples["test"]
59+
60+
@property
61+
def ood_test_samples(self):
62+
return self.samples.get("ood_test", None)

0 commit comments

Comments
 (0)