Skip to content

nmakke/SR-LivingReview

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

286 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A Living Review of Symbolic Regression Methods and Applications

This note aims to collect references for symbolic regression (SR) methods and datasets as part of the recent review entitled "Interpretable Scientific Discovery with Symbolic Regression: A Review". The latter review SR methods and state-of-the-art applications of SR, along with existing datasets usually used in testing SR methods, and discusses their main strength and weakness.

A living review for symbolic regression is first proposed in the mentioned review in analogy with "A Living Review of Machine Learning for Particle Physics" which can be found here. The goal is to list all research works on symbolic regression so it is expected that this list will continue to evolve. The fact that a paper is listed in this document does not endorse or validate its content - that is for the community (and for peer-review) to decide.

Symbolic regression is an emerging branch of machine learning that aims to learn analytical form of underlying model in data, by searching the space of mathematical functions. A growing interest in symbolic regression is taking place in the AI community because it pomotes interpretability, which is a critical factor for a safe AI application.

Methods

The symbolic regression problem can be approached and solved in different manners, depending on the way the target mathematical expression f(x) is defined. References (for SR) are categorized in an as easy and useful manner as possible, with a summary given in the table below.

Category Methods learned model
Regression-based Linear SR
Non-linear SR
System of linear equations
Deep Neural Network
Expression tree-based Genetic programming (GP)
Reinforcement learning (RL)
Transformer neural network (TNN)
tree structure
policy
sequence
Physics-inspired AIFeynman Brute force search and neural network
Mathematics-inspired Metamodel Meijer functions

Applications

Datasets

Data sets can be categorized into two main groups:

Synthetic data for which analytical form of underlying model is known, and used to generate data points.
Example: $f(x) = 2x^2 + \cos(x)$, $x \in [0,1] \rightarrow \mathcal{D}=(x_i,f(x_i))_{i=1}^{n}$

Real-world data for which underlying model is unknown.
$\mathcal{D}=(x_i,y_i)_{i=1}^{n}$

Category reference # equations year
Physics-related Strogatz repositery
Feynman Database
10
120
2011
2019
Mathematics-related Koza
Keijer
Vladislavleva
Nguyen
Korns
R
Jin
Livermore
3
15
8
12
15
3
6
22
1994
2003
2009
2011
2011
2013
2019
2021
Real-world problems link

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published