Skip to content

Commit b064133

Browse files
added warning
1 parent d5170a0 commit b064133

File tree

7 files changed

+52
-17
lines changed

7 files changed

+52
-17
lines changed

API_REFERENCE_FOR_REGRESSION.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,7 @@ A numpy matrix with predictor values.
334334

335335
## Method: get_unique_term_affiliation_shape(unique_term_affiliation: str)
336336

337-
***Returns a matrix containing one column for each predictor used in the unique term affiliation, in addition to one column for the contribution to the linear predictor. For main effects or two-way interactions this can be visualized in for example line plots and surface plots respectively. See this [example](https://github.com/ottenbreit-data-science/aplr/blob/main/examples/train_aplr_regression.py).***
337+
***Returns a matrix containing one column for each predictor used in the unique term affiliation, in addition to one column for the contribution to the linear predictor. For main effects or two-way interactions this can be visualized in for example line plots and surface plots respectively. See this [example](https://github.com/ottenbreit-data-science/aplr/blob/main/examples/train_aplr_regression.py). Please note that the get_unique_term_affiliation_shape method is currently very memory intensive when handling interactions and may crash without warning on larger models. Consider using either of the calculate_local_feature_contribution or calculate_local_contribution_from_selected_terms methods to interpret interactions on larger models.***
338338

339339
### Parameters
340340

cpp/APLRRegressor.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2630,6 +2630,12 @@ MatrixXd APLRRegressor::get_unique_term_affiliation_shape(const std::string &uni
26302630
std::vector<size_t> relevant_term_indexes{compute_relevant_term_indexes(unique_term_affiliation)};
26312631
size_t unique_term_affiliation_index{unique_term_affiliation_map[unique_term_affiliation]};
26322632
size_t num_predictors_used_in_the_affiliation{base_predictors_in_each_unique_term_affiliation[unique_term_affiliation_index].size()};
2633+
if (num_predictors_used_in_the_affiliation > 1)
2634+
{
2635+
std::string warning{"Please note that the get_unique_term_affiliation_shape method is currently very memory intensive when handling interactions and may crash without warning on larger models. Consider using either of the calculate_local_feature_contribution or calculate_local_contribution_from_selected_terms methods to interpret interactions on larger models."};
2636+
std::cout << warning << std::endl;
2637+
}
2638+
26332639
std::vector<std::vector<double>> split_points_in_each_predictor(num_predictors_used_in_the_affiliation);
26342640
for (size_t i = 0; i < num_predictors_used_in_the_affiliation; ++i)
26352641
{

cpp/functions.h

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -544,32 +544,32 @@ double calculate_standard_deviation(const VectorXd &vector, const VectorXd &samp
544544

545545
MatrixXd generate_combinations_and_one_additional_column(const std::vector<std::vector<double>> &vectors)
546546
{
547-
int numVectors = vectors.size();
548-
std::vector<int> sizes(numVectors);
549-
int numRows = 1;
547+
int num_vectors = vectors.size();
548+
std::vector<int> sizes(num_vectors);
549+
int num_rows = 1;
550550

551551
// Calculate the number of rows in the result matrix
552-
for (int i = 0; i < numVectors; ++i)
552+
for (int i = 0; i < num_vectors; ++i)
553553
{
554554
sizes[i] = vectors[i].size();
555-
numRows *= sizes[i];
555+
num_rows *= sizes[i];
556556
}
557557

558558
// Initialize the result matrix with an additional unused column
559-
MatrixXd result(numRows, numVectors + 1);
559+
MatrixXd result(num_rows, num_vectors + 1);
560560

561561
// Generate all combinations
562-
for (int row = 0; row < numRows; ++row)
562+
for (int row = 0; row < num_rows; ++row)
563563
{
564564
int index = row;
565-
for (int col = 0; col < numVectors; ++col)
565+
for (int col = 0; col < num_vectors; ++col)
566566
{
567567
int vecSize = sizes[col];
568568
result(row, col) = vectors[col][index % vecSize];
569569
index /= vecSize;
570570
}
571571
// Set the additional unused column to zero (or any other value)
572-
result(row, numVectors) = 0;
572+
result(row, num_vectors) = 0;
573573
}
574574

575575
return result;
Binary file not shown.

documentation/model_interpretation_for_regression.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ Use the ***get_feature_importance*** method as shown in this [example](https://g
77
Use the ***calculate_feature_importance*** method or the ***calculate_local_feature_contribution*** method, for example on test data or new data. Usage of these methods is demonstrated in this [example](https://github.com/ottenbreit-data-science/aplr/blob/main/examples/train_aplr_regression.py).
88

99
## Main effects
10-
Use the ***get_unique_term_affiliation_shape*** method or the ***get_main_effect_shape*** method to interpret main effects as shown in this [example](https://github.com/ottenbreit-data-science/aplr/blob/main/examples/train_aplr_regression.py). For each main effect, you may plot the output in a line plot.
10+
Use the ***get_main_effect_shape*** method or the ***get_unique_term_affiliation_shape*** method to interpret main effects as shown in this [example](https://github.com/ottenbreit-data-science/aplr/blob/main/examples/train_aplr_regression.py). For each main effect, you may plot the output in a line plot.
1111

1212
## Interactions
13-
For best interpretability of interactions, do not use a higher ***max_interaction_level*** than 1. Use the ***get_unique_term_affiliation_shape*** method to interpret interactions as shown in this [example](https://github.com/ottenbreit-data-science/aplr/blob/main/examples/train_aplr_regression.py). For each two-way interaction of interest you may plot the output in a 3D surface plot.
13+
For best interpretability of interactions, do not use a higher ***max_interaction_level*** than 1. Use the ***get_unique_term_affiliation_shape*** method if your computer has enough memory (the method is currently very memory intensive when handling interaction terms and may crash without warning on larger models) or either of the ***calculate_local_feature_contribution*** or ***calculate_local_contribution_from_selected_terms*** methods to interpret interactions as shown in this [example](https://github.com/ottenbreit-data-science/aplr/blob/main/examples/train_aplr_regression.py). For each two-way interaction of interest you may plot the output in a 3D surface plot.
1414

1515
## Interpretation of model terms and their regression coefficients
1616
The above interpretations of main effects and interactions are sufficient to interpret an APLR model. However, it is possible to also inspect the underlying terms for those who wish to do so. For an example on how to interpret the terms in an APLR model, please see ***Section 5.1.3*** in the published article about APLR. You can find this article on [https://link.springer.com/article/10.1007/s00180-024-01475-4](https://link.springer.com/article/10.1007/s00180-024-01475-4) and [https://rdcu.be/dz7bF](https://rdcu.be/dz7bF).

examples/train_aplr_regression.py

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,11 @@
102102
by="importance", ascending=False
103103
)
104104

105-
# Shapes for all term affiliations in the model. For each term affiliation, contains relevant predictor values and the corresponding
106-
# contributions to the linear predictor.
107-
# This is probably the most useful method to use for understanding how the model works.
108-
# Plots are created for main effects and two-way interactions.
105+
# Shapes for all term affiliations in the model. For each term affiliation, contains predictor values and the corresponding
106+
# contributions to the linear predictor. Plots are created for main effects and two-way interactions.
107+
# This is probably the most useful method to use for understanding how the model works but it is currently very memory intensive when
108+
# handling interactions and may crash without warning on larger models. Consider using either of the calculate_local_feature_contribution
109+
# or calculate_local_contribution_from_selected_terms methods to interpret interactions on larger models.
109110
shapes: Dict[str, pd.DataFrame] = {}
110111
predictors_in_each_affiliation = (
111112
best_model.get_base_predictors_in_each_unique_term_affiliation()
@@ -161,6 +162,34 @@
161162
best_model.calculate_local_feature_contribution(data_train[predictors]),
162163
columns=best_model.get_unique_term_affiliations(),
163164
)
165+
# Combining predictor values with local feature contribution for the second feature in best_model.get_unique_term_affiliations().
166+
# This can be visualized if it is a main effect or a two-way interaction.
167+
unique_term_affiliation_index = 1
168+
predictors_in_the_second_feature = [
169+
predictors[predictor_index]
170+
for predictor_index in best_model.get_base_predictors_in_each_unique_term_affiliation()[
171+
unique_term_affiliation_index
172+
]
173+
]
174+
data_to_visualize = pd.DataFrame(
175+
np.concatenate(
176+
(
177+
data_train[predictors_in_the_second_feature].values,
178+
local_feature_contribution[
179+
[
180+
best_model.get_unique_term_affiliations()[
181+
unique_term_affiliation_index
182+
]
183+
]
184+
],
185+
),
186+
axis=1,
187+
),
188+
columns=predictors_in_the_second_feature
189+
+ [
190+
f"contribution from {best_model.get_unique_term_affiliations()[unique_term_affiliation_index]}"
191+
],
192+
)
164193

165194
# Local (observation specific) contribution to the linear predictor from selected interacting predictors.
166195
# In this example this concerns two-way interaction terms in the model where the fourth and the seventh predictors in X interact.

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727

2828
setuptools.setup(
2929
name="aplr",
30-
version="10.4.0",
30+
version="10.4.1",
3131
description="Automatic Piecewise Linear Regression",
3232
ext_modules=[sfc_module],
3333
author="Mathias von Ottenbreit",

0 commit comments

Comments
 (0)