This note aims to collect references for symbolic regression (SR) methods and datasets as part of the recent review entitled "Interpretable Scientific Discovery with Symbolic Regression: A Review". The latter review SR methods and state-of-the-art applications of SR, along with existing datasets usually used in testing SR methods, and discusses their main strength and weakness.
A living review for symbolic regression is first proposed in the mentioned review in analogy with "A Living Review of Machine Learning for Particle Physics" which can be found here. The goal is to list all research works on symbolic regression so it is expected that this list will continue to evolve. The fact that a paper is listed in this document does not endorse or validate its content - that is for the community (and for peer-review) to decide.
Symbolic regression is an emerging branch of machine learning that aims to learn analytical form of underlying model in data, by searching the space of mathematical functions. A growing interest in symbolic regression is taking place in the AI community because it pomotes interpretability, which is a critical factor for a safe AI application.
The symbolic regression problem can be approached and solved in different manners, depending on the way the target mathematical expression f(x) is defined. References (for SR) are categorized in an as easy and useful manner as possible, with a summary given in the table below.
| Category | Methods | learned model |
|---|---|---|
| Regression-based | Linear SR Non-linear SR |
System of linear equations Deep Neural Network |
| Expression tree-based | Genetic programming (GP) Reinforcement learning (RL) Transformer neural network (TNN) |
tree structure policy sequence |
| Physics-inspired | AIFeynman | Brute force search and neural network |
| Mathematics-inspired | Metamodel | Meijer functions |
-
Regression-based SR
-
Linear approach
-
Non-linear approaches
- AI Feynman: a Physics-Inspired Method for Symbolic Regression [DOI]
code - Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery
- Symbolic regression for scientific discovery: an application to wind speed forecasting
- Relational inductive biases, deep learning, and graph networks
- Extrapolation and learning equations (EQL)
- Learning Equations for Extrapolation and Control(EQL_division)
- AI Feynman: a Physics-Inspired Method for Symbolic Regression [DOI]
-
-
Expression tree-based approaches
-
Genetic programming
- Eurequa
- PySR: High-Performance Symbolic Regression in Python and Julia
- Genetic programming as a means for programming computers by natural selection [DOI]
- Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming [DOI]
- Improving Symbolic Regression with Interval Arithmetic and Linear Scaling [DOI]
- Accuracy in Symbolic Regression [DOI]
- Semantically-based crossover in genetic programming: application to real-valued symbolic regression [DOI]
-
Reinforcement learning
-
Transformer neural network
-
-
Physics-inspired
-
Mathematics-inspired
-
Computational approach
-
Physics
- Discovering Symbolic Models from Deep Learning with Inductive Biases
code(GNN + SR) - Data-driven discovery of coordinates and governing equations
code(SINDY + AE) - Rediscovering orbital mechanics with machine learning
- Back to the Formula -- LHC Edition
- SYMBA: SYMBOLIC COMPUTATION OF SQUARED AMPLITUDES IN HIGH ENERGY PHYSICS WITH MACHINE LEARNING
code
- Discovering Symbolic Models from Deep Learning with Inductive Biases
-
Benchmark
Data sets can be categorized into two main groups:
Synthetic data for which analytical form of underlying model is known, and used to generate data points.
Example:
Real-world data for which underlying model is unknown.
| Category | reference | # equations | year |
|---|---|---|---|
| Physics-related | Strogatz repositery Feynman Database |
10 120 |
2011 2019 |
| Mathematics-related | Koza Keijer Vladislavleva Nguyen Korns R Jin Livermore |
3 15 8 12 15 3 6 22 |
1994 2003 2009 2011 2011 2013 2019 2021 |
| Real-world problems | link |