Skip to content

Commit ad620a0

Browse files
9.10.0
1 parent 41016af commit ad620a0

File tree

11 files changed

+93
-11
lines changed

11 files changed

+93
-11
lines changed

API_REFERENCE_FOR_REGRESSION.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,19 @@ A numpy matrix with predictor values.
239239
A numpy matrix with predictor values.
240240

241241

242+
## Method: calculate_local_contribution_from_selected_terms(X:npt.ArrayLike, predictor_indexes:List[int])
243+
244+
***Returns a numpy vector containing the contribution to the linear predictor from an user specified combination of interacting predictors for each observation in X. This makes it easier to interpret interactions (or main effects if just one predictor is specified), for example by plotting predictor values against the term contribution.***
245+
246+
### Parameters
247+
248+
#### X
249+
A numpy matrix with predictor values.
250+
251+
#### predictor_indexes
252+
A list of integers specifying the indexes of predictors in X to use. For example, [1, 3] means the second and fourth predictors in X.
253+
254+
242255
## Method: calculate_terms(X:npt.ArrayLike)
243256

244257
***Returns a numpy matrix containing values of model terms calculated on X.***

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Automatic Piecewise Linear Regression.
33

44
# About
5-
Build predictive and interpretable parametric regression or classification machine learning models in Python based on the Automatic Piecewise Linear Regression (APLR) methodology developed by Mathias von Ottenbreit. APLR is often able to compete with tree-based methods on predictiveness, but unlike tree-based methods APLR is interpretable. See the ***documentation*** folder for more information. Links to published article: [https://link.springer.com/article/10.1007/s00180-024-01475-4](https://link.springer.com/article/10.1007/s00180-024-01475-4) and [https://rdcu.be/dz7bF](https://rdcu.be/dz7bF).
5+
Build predictive and interpretable parametric regression or classification machine learning models in Python based on the Automatic Piecewise Linear Regression (APLR) methodology developed by Mathias von Ottenbreit. APLR is often able to compete with tree-based methods on predictiveness, but unlike tree-based methods APLR is interpretable. Please see the [documentation](https://github.com/ottenbreit-data-science/aplr/tree/main/documentation) for more information. Links to published article: [https://link.springer.com/article/10.1007/s00180-024-01475-4](https://link.springer.com/article/10.1007/s00180-024-01475-4) and [https://rdcu.be/dz7bF](https://rdcu.be/dz7bF). More functionality has been added to APLR since the article was published.
66

77
# How to install
88
***pip install aplr***
@@ -11,10 +11,10 @@ Build predictive and interpretable parametric regression or classification machi
1111
Currently available for Windows and most Linux distributions.
1212

1313
# How to use
14-
Please see the two example Python scripts in the ***examples*** folder. They cover common use cases, but not all of the functionality in this package.
14+
Please see the two example Python scripts [here](https://github.com/ottenbreit-data-science/aplr/tree/main/examples). They cover common use cases, but not all of the functionality in this package.
1515

1616
# Sponsorship
1717
Please consider sponsoring Ottenbreit Data Science by clicking on the Sponsor button. Sufficient funding will enable maintenance of APLR and further development.
1818

1919
# API reference
20-
Please see ***API_REFERENCE_FOR_REGRESSION.md*** and ***API_REFERENCE_FOR_CLASSIFICATION.md***.
20+
Please see the [api reference for regression](https://github.com/ottenbreit-data-science/aplr/blob/main/API_REFERENCE_FOR_REGRESSION.md) and [api reference for classification](https://github.com/ottenbreit-data-science/aplr/blob/main/API_REFERENCE_FOR_CLASSIFICATION.md).

aplr/aplr.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,13 @@ def calculate_local_feature_contribution(self, X: npt.ArrayLike) -> npt.ArrayLik
237237
def calculate_local_term_contribution(self, X: npt.ArrayLike) -> npt.ArrayLike:
238238
return self.APLRRegressor.calculate_local_term_contribution(X)
239239

240+
def calculate_local_contribution_from_selected_terms(
241+
self, X: npt.ArrayLike, predictor_indexes: List[int]
242+
) -> npt.ArrayLike:
243+
return self.APLRRegressor.calculate_local_contribution_from_selected_terms(
244+
X, predictor_indexes
245+
)
246+
240247
def calculate_terms(self, X: npt.ArrayLike) -> npt.ArrayLike:
241248
return self.APLRRegressor.calculate_terms(X)
242249

cpp/APLRRegressor.h

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,7 @@ class APLRRegressor
262262
VectorXd calculate_term_importance(const MatrixXd &X, const VectorXd &sample_weight = VectorXd(0));
263263
MatrixXd calculate_local_feature_contribution(const MatrixXd &X);
264264
MatrixXd calculate_local_term_contribution(const MatrixXd &X);
265+
VectorXd calculate_local_contribution_from_selected_terms(const MatrixXd &X, const std::vector<size_t> &predictor_indexes);
265266
MatrixXd calculate_terms(const MatrixXd &X);
266267
std::vector<std::string> get_term_names();
267268
VectorXd get_term_coefficients();
@@ -2277,6 +2278,29 @@ MatrixXd APLRRegressor::calculate_local_term_contribution(const MatrixXd &X)
22772278
return output;
22782279
}
22792280

2281+
VectorXd APLRRegressor::calculate_local_contribution_from_selected_terms(const MatrixXd &X, const std::vector<size_t> &predictor_indexes)
2282+
{
2283+
validate_that_model_can_be_used(X);
2284+
2285+
VectorXd contribution_from_selected_terms{VectorXd::Constant(X.rows(), 0.0)};
2286+
2287+
std::vector<size_t> term_indexes_used;
2288+
term_indexes_used.reserve(terms.size());
2289+
for (size_t i = 0; i < terms.size(); ++i)
2290+
{
2291+
if (terms[i].term_uses_just_these_predictors(predictor_indexes))
2292+
term_indexes_used.push_back(i);
2293+
}
2294+
term_indexes_used.shrink_to_fit();
2295+
2296+
for (auto &term_index_used : term_indexes_used)
2297+
{
2298+
contribution_from_selected_terms += terms[term_index_used].calculate_contribution_to_linear_predictor(X);
2299+
}
2300+
2301+
return contribution_from_selected_terms;
2302+
}
2303+
22802304
MatrixXd APLRRegressor::calculate_terms(const MatrixXd &X)
22812305
{
22822306
validate_that_model_can_be_used(X);
@@ -2464,11 +2488,6 @@ std::map<double, double> APLRRegressor::get_coefficient_shape_function(size_t pr
24642488
return coefficient_shape_function;
24652489
}
24662490

2467-
double APLRRegressor::get_cv_error()
2468-
{
2469-
return cv_error;
2470-
}
2471-
24722491
std::vector<size_t> APLRRegressor::compute_relevant_term_indexes(size_t predictor_index)
24732492
{
24742493
std::vector<size_t> relevant_term_indexes;
@@ -2493,4 +2512,9 @@ std::vector<size_t> APLRRegressor::compute_relevant_term_indexes(size_t predicto
24932512
}
24942513
relevant_term_indexes.shrink_to_fit();
24952514
return relevant_term_indexes;
2515+
}
2516+
2517+
double APLRRegressor::get_cv_error()
2518+
{
2519+
return cv_error;
24962520
}

cpp/functions.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ std::set<int> get_unique_integers(const VectorXi &int_vector)
6767
return unique_integers;
6868
}
6969

70+
std::set<size_t> get_unique_integers(const std::vector<size_t> &size_t_vector)
71+
{
72+
std::set<size_t> unique_integers{size_t_vector.begin(), size_t_vector.end()};
73+
return unique_integers;
74+
}
75+
7076
double set_error_to_infinity_if_invalid(double error)
7177
{
7278
bool error_is_invalid{!std::isfinite(error)};

cpp/pythonbinding.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ PYBIND11_MODULE(aplr_cpp, m)
5858
.def("calculate_term_importance", &APLRRegressor::calculate_term_importance, py::arg("X"), py::arg("sample_weight") = VectorXd(0))
5959
.def("calculate_local_feature_contribution", &APLRRegressor::calculate_local_feature_contribution, py::arg("X"))
6060
.def("calculate_local_term_contribution", &APLRRegressor::calculate_local_term_contribution, py::arg("X"))
61+
.def("calculate_local_contribution_from_selected_terms", &APLRRegressor::calculate_local_contribution_from_selected_terms, py::arg("X"), py::arg("predictor_indexes"))
6162
.def("calculate_terms", &APLRRegressor::calculate_terms, py::arg("X"))
6263
.def("get_term_names", &APLRRegressor::get_term_names)
6364
.def("get_term_coefficients", &APLRRegressor::get_term_coefficients)

cpp/term.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ class Term
7070
bool coefficient_adheres_to_monotonic_constraint();
7171
InteractionConstraintsTest test_interaction_constraints(const std::vector<size_t> &legal_interaction_combination);
7272
std::vector<size_t> get_unique_base_terms_used_in_this_term();
73+
bool term_uses_just_these_predictors(const std::vector<size_t> &predictor_indexes);
7374

7475
public:
7576
std::string name;
@@ -773,6 +774,20 @@ double Term::get_estimated_term_importance()
773774
return estimated_term_importance;
774775
}
775776

777+
bool Term::term_uses_just_these_predictors(const std::vector<size_t> &predictor_indexes)
778+
{
779+
std::vector<size_t> predictor_indexes_used_by_this_term;
780+
predictor_indexes_used_by_this_term.push_back(base_term);
781+
for (auto &given_term : given_terms)
782+
{
783+
predictor_indexes_used_by_this_term.push_back(given_term.base_term);
784+
}
785+
std::set<size_t> unique_predictor_indexes_used_by_this_term{get_unique_integers(predictor_indexes_used_by_this_term)};
786+
std::set<size_t> unique_predictor_indexes{get_unique_integers(predictor_indexes)};
787+
bool only_predictor_indexes_are_used{unique_predictor_indexes_used_by_this_term == unique_predictor_indexes};
788+
return only_predictor_indexes_are_used;
789+
}
790+
776791
std::vector<size_t> create_term_indexes(std::vector<Term> &terms)
777792
{
778793
std::vector<size_t> term_indexes;

cpp/tests.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1597,6 +1597,7 @@ class Tests
15971597

15981598
VectorXd predictions{model.predict(X_test)};
15991599
MatrixXd li{model.calculate_local_feature_contribution(X_test)};
1600+
VectorXd li_for_particular_terms{model.calculate_local_contribution_from_selected_terms(X_train, {5, 1})};
16001601

16011602
// Saving results
16021603
save_as_csv_file("data/output.csv", predictions);
@@ -1607,8 +1608,12 @@ class Tests
16071608
std::map<double, double> coefficient_shape_function = model.get_coefficient_shape_function(1);
16081609
bool coefficient_shape_function_has_correct_length{coefficient_shape_function.size() == 27};
16091610
bool coefficient_shape_function_value_test{is_approximately_equal(coefficient_shape_function.begin()->second, 0.04175, 0.00001)};
1611+
bool li_for_particular_terms_has_correct_size{li_for_particular_terms.rows() == X_train.rows()};
1612+
bool li_for_particular_terms_mean_is_correct{is_approximately_equal(li_for_particular_terms.mean(), 0.30321952178814915)};
16101613
tests.push_back(coefficient_shape_function_has_correct_length);
16111614
tests.push_back(coefficient_shape_function_value_test);
1615+
tests.push_back(li_for_particular_terms_has_correct_size);
1616+
tests.push_back(li_for_particular_terms_mean_is_correct);
16121617
}
16131618

16141619
void test_aplr_classifier_multi_class_other_params()
Binary file not shown.

examples/train_aplr_regression.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@
102102
}
103103
)
104104

105-
# Coefficient shape for the third predictor. Will be empty if the third predictor is not used as a main effect in the model
105+
# Coefficient shape for the third predictor. Will be empty if the third predictor is not used as a main effect in the model.
106106
coefficient_shape = best_model.get_coefficient_shape_function(predictor_index=2)
107107
coefficient_shape = pd.DataFrame(
108108
{
@@ -111,6 +111,13 @@
111111
}
112112
)
113113

114+
# Local (observation specific) contribution to the linear predictor from selected interacting predictors.
115+
# In this example this concerns two-way interaction terms in the model where the second and the third predictors in X interact.
116+
# The local contribution will be zero for all observations if there are no such terms in the model.
117+
# The local contribution can help interpreting interactions (or main effects if only one predictor index is specified).
118+
# In this example, the local contribution can be plotted against the predictor values for a visual interpretation.
119+
contribution_from_selected_terms = best_model.calculate_local_contribution_from_selected_terms(X=data_train[predictors],predictor_indexes=[1,2])
120+
114121

115122
# PREDICTING AND TESTING ON THE TEST SET
116123
data_test[predicted] = best_model.predict(data_test[predictors].values)

0 commit comments

Comments
 (0)