You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
***This method returns predicted class probabilities of the best tuned model as a numpy matrix. Similar to the predict_class_probabilities method but the name predict_proba is compatible with scikit-learn.***
59
+
60
+
### Parameters
61
+
62
+
#### X
63
+
A numpy matrix with predictor values.
64
+
65
+
#### kwargs
66
+
Optional parameters sent to the predict_class_probabilities method in the best tuned model.
67
+
68
+
69
+
## Method: get_best_estimator()
70
+
71
+
***Returns the best tuned model. This is an APLRRegressor or APLRClassifier object.***
72
+
73
+
74
+
## Method: get_cv_results()
75
+
76
+
***Returns the cv results from the tuning as a list of dictionaries, List[Dict[str, float]].***
The maximum number of boosting steps. If validation error does not flatten out at the end of the ***m***th boosting step, then try increasing it (or alternatively increase the learning rate).
9
9
10
-
#### v (default = 0.1)
11
-
The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required. However, empirical evidence suggests that ***v <= 0.1*** gives better results. If the algorithm learns too fast (requires few boosting steps to converge) then try lowering the learning rate. Computational costs can be reduced by increasing the learning rate while simultaneously decreasing ***m***, potentially at the expense of predictiveness.
10
+
#### v (default = 0.5)
11
+
The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required, reducing computational costs potentially at the expense of predictiveness. Empirical evidence suggests that ***v <= 0.5*** gives good results for APLR.
12
12
13
13
#### random_state (default = 0)
14
14
Used to randomly split training observations into cv_folds if ***cv_observations*** is not specified when fitting.
@@ -31,13 +31,13 @@ Specifies the maximum allowed depth of interaction terms. ***0*** means that int
31
31
#### max_interactions (default = 100000)
32
32
The maximum number of interactions allowed in each underlying model. A lower value may be used to reduce computational time or to increase interpretability.
33
33
34
-
#### min_observations_in_split (default = 20)
34
+
#### min_observations_in_split (default = 4)
35
35
The minimum effective number of observations that a term in the model must rely on. This hyperparameter should be tuned. Larger values are more appropriate for larger datasets. Larger values result in more robust models (lower variance), potentially at the expense of increased bias.
Controls how many boosting steps a term that becomes ineligible has to remain ineligible. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
39
39
40
-
#### max_eligible_terms (default = 5)
40
+
#### max_eligible_terms (default = 7)
41
41
Limits 1) the number of terms already in the model that can be considered as interaction partners in a boosting step and 2) how many terms remain eligible in the next boosting step. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
@@ -93,7 +93,7 @@ An optional list of integers specifying monotonic constraints on model terms. Fo
93
93
An optional list containing lists of integers. Specifies interaction constraints on model terms. For example, interaction_constraints = [[0,1], [1,2,3]] means that 1) the first and second predictors may interact with each other, and that 2) the second, third and fourth predictors may interact with each other. There are no interaction constraints on predictors not mentioned in interaction_constraints.
94
94
95
95
#### predictor_learning_rates
96
-
An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
96
+
An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
97
97
98
98
#### predictor_penalties_for_non_linearity
99
99
An optional list of floats specifying penalties for non-linearity for each predictor. If provided then this supercedes ***penalty_for_non_linearity***. For example, if there are two predictors in ***X***, then predictor_penalties_for_non_linearity = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.2.
The maximum number of boosting steps. If validation error does not flatten out at the end of the ***m***th boosting step, then try increasing it (or alternatively increase the learning rate).
9
9
10
-
#### v (default = 0.1)
11
-
The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required. However, empirical evidence suggests that ***v <= 0.1*** gives better results. If the algorithm learns too fast (requires few boosting steps to converge) then try lowering the learning rate. Computational costs can be reduced by increasing the learning rate while simultaneously decreasing ***m***, potentially at the expense of predictiveness.
10
+
#### v (default = 0.5)
11
+
The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required, reducing computational costs potentially at the expense of predictiveness. Empirical evidence suggests that ***v <= 0.5*** gives good results for APLR.
12
12
13
13
#### random_state (default = 0)
14
14
Used to randomly split training observations into cv_folds if ***cv_observations*** is not specified when fitting.
@@ -34,13 +34,13 @@ Specifies the maximum allowed depth of interaction terms. ***0*** means that int
34
34
#### max_interactions (default = 100000)
35
35
The maximum number of interactions allowed in each underlying model. A lower value may be used to reduce computational time or to increase interpretability.
36
36
37
-
#### min_observations_in_split (default = 20)
37
+
#### min_observations_in_split (default = 4)
38
38
The minimum effective number of observations that a term in the model must rely on. This hyperparameter should be tuned. Larger values are more appropriate for larger datasets. Larger values result in more robust models (lower variance), potentially at the expense of increased bias.
Controls how many boosting steps a term that becomes ineligible has to remain ineligible. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
42
42
43
-
#### max_eligible_terms (default = 5)
43
+
#### max_eligible_terms (default = 7)
44
44
Limits 1) the number of terms already in the model that can be considered as interaction partners in a boosting step and 2) how many terms remain eligible in the next boosting step. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
45
45
46
46
#### verbosity (default = 0)
@@ -167,7 +167,7 @@ An optional list containing lists of integers. Specifies interaction constraints
167
167
An optional numpy matrix with other data. This is used in custom loss, negative gradient and validation error functions.
168
168
169
169
#### predictor_learning_rates
170
-
An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
170
+
An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
171
171
172
172
#### predictor_penalties_for_non_linearity
173
173
An optional list of floats specifying penalties for non-linearity for each predictor. If provided then this supercedes ***penalty_for_non_linearity***. For example, if there are two predictors in ***X***, then predictor_penalties_for_non_linearity = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.2.
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
Automatic Piecewise Linear Regression.
3
3
4
4
# About
5
-
Build predictive and interpretable parametric regression or classification machine learning models in Python based on the Automatic Piecewise Linear Regression (APLR) methodology developed by Mathias von Ottenbreit. APLR is often able to compete with tree-based methods on predictiveness, but unlike tree-based methods APLR is interpretable. Please see the [documentation](https://github.com/ottenbreit-data-science/aplr/tree/main/documentation) for more information. Links to published article: [https://link.springer.com/article/10.1007/s00180-024-01475-4](https://link.springer.com/article/10.1007/s00180-024-01475-4) and [https://rdcu.be/dz7bF](https://rdcu.be/dz7bF). More functionality has been added to APLR since the article was published.
5
+
Build predictive and interpretable parametric regression or classification machine learning models in Python based on the Automatic Piecewise Linear Regression (APLR) methodology developed by Mathias von Ottenbreit. APLR is often able to compete with tree-based methods on predictiveness, but unlike tree-based methods APLR is interpretable. Furthermore, APLR produces smoother predictions than tree-based methods. Please see the [documentation](https://github.com/ottenbreit-data-science/aplr/tree/main/documentation) for more information. Links to published article: [https://link.springer.com/article/10.1007/s00180-024-01475-4](https://link.springer.com/article/10.1007/s00180-024-01475-4) and [https://rdcu.be/dz7bF](https://rdcu.be/dz7bF). More functionality has been added to APLR since the article was published.
0 commit comments