A C++ implementation of Multivariate Adaptive Regression Splines. This is a semi-brute force search for interactions and non-linearities. It will give almost as good performance as a neural network, but with much faster model evaluation run-times.
Some references:
- There is a nice write-up here describing the method.
- There is also a commercial package here.
- The documentation for the R "earth" package is here.
- Stephen Milborrow maintains an excellent resource here.
- Additionally there is an module for to scikit-learn here.
We use OpenMP to achieve pretty decent speed-up per
core used. There is some memory overhead for each thread launched, which might
constraint the total number of cores available. One can control the number of
threads used via the OMP_NUM_THREADS environment variable or via the
threads argument.
The following timings were obtained from a AMD EPYC 9654 96-Core Processor with 192 logical CPUs. Note that multi-threaded performance is nearly ideal up to 30 cores or so.
These instructions have been verified to work on the following platforms:
- Ubuntu 18.04 and 20.04
- Raspbian 10
- macOS 10.13 (WIP)
Eigen - The code has been tested with version 3.3.4.
sudo apt install -y libeigen3-dev... on macOS:
brew install pkg-config eigenGoogleTest - Unfortunately, the library is no longer available pre-compiled on Ubuntu.
sudo apt install -y libgtest-dev cmake
cd /usr/src/gtest
sudo cmake CMakeLists.txt && sudo make
sudo cp *.a /usr/libpybind11 - Install using your python package manager of choice:
pip3 install pybind11... or ...
conda install -y pybind11You can either use the Makefile:
cd mars
make
make test # optional - build and run the unit testsOr the setup.py script provided:
cd mars
pip install .Here we train a linear model with a categorical interaction.
import numpy as np
X = np.random.randn(10000, 2)
X[:,1] = np.random.binomial(1, .5, size=len(X))
y = 2*X[:,0] + 3*X[:,1] + X[:,0]*X[:,1] + np.random.randn(len(X))
# convert to column-major float
X = np.array(X, order='F', dtype='f')
y = np.array(y, dtype='f')
# Fit the earth model
import mars
model = mars.fit(X, y, max_epochs=8, tail_span=0, linear_only=True)
B = mars.expand(X, model) # expand the basis
beta = np.linalg.lstsq(B, y, rcond=None)[0]
y_hat = B @ beta
# Pretty-print the model
mars.pprint(model, beta)Depending on the random seed, the result should look similar to this:
-0.003
+1.972 * X[0]
+3.001 * X[1]
+1.048 * X[0] * X[1]
