Skip to content

Commit 48087e8

Browse files
pruning
1 parent 0e5f0f1 commit 48087e8

File tree

12 files changed

+377
-177
lines changed

12 files changed

+377
-177
lines changed

API_REFERENCE_FOR_CLASSIFICATION.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# APLRClassifier
22

3-
## class aplr.APLRClassifier(m:int=9000, v:float=0.1, random_state:int=0, n_jobs:int=0, validation_ratio:float=0.2, bins:int=300, verbosity:int=0, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5)
3+
## class aplr.APLRClassifier(m:int=9000, v:float=0.1, random_state:int=0, n_jobs:int=0, validation_ratio:float=0.2, bins:int=300, verbosity:int=0, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, boosting_steps_before_pruning_is_done:int = 500)
44

55
### Constructor parameters
66

@@ -40,8 +40,11 @@ Controls how many boosting steps a term that becomes ineligible has to remain in
4040
#### max_eligible_terms (default = 5)
4141
Limits 1) the number of terms already in the model that can be considered as interaction partners in a boosting step and 2) how many terms remain eligible in the next boosting step. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
4242

43+
#### boosting_steps_before_pruning_is_done (default = 500)
44+
Specifies how many boosting steps to wait before pruning the model. With the default value, this means that in boosting steps 500, 1000, and so on, the model will be pruned. When pruning, terms are removed as long as this reduces the training error. This can be a computationally costly operation especially if the model gets many terms. To switch off pruning set ***boosting_steps_before_pruning_is_done*** to a value higher than ***m***.
4345

44-
## Method: fit(X:npt.ArrayLike, y:List[str], sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[], prioritized_predictors_indexes:List[int]=[], monotonic_constraints:List[int]=[], interaction_constraints:List[int]=[])
46+
47+
## Method: fit(X:npt.ArrayLike, y:List[str], sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[], prioritized_predictors_indexes:List[int]=[], monotonic_constraints:List[int]=[], interaction_constraints:List[List[int]]=[])
4548

4649
***This method fits the model to data.***
4750

@@ -69,7 +72,7 @@ An optional list of integers specifying the indexes of predictors (columns) in *
6972
An optional list of integers specifying monotonic constraints on model terms. For example, if there are three predictors in ***X***, then monotonic_constraints = [1,0,-1] means that 1) the first predictor in ***X*** cannot be used in interaction terms as a secondary effect and all terms using the first predictor in ***X*** as a main effect must have positive regression coefficients, 2) there are no monotonic constraints on terms using the second predictor in ***X***, and 3) the third predictor in ***X*** cannot be used in interaction terms as a secondary effect and all terms using the third predictor in ***X*** as a main effect must have negative regression coefficients.
7073

7174
#### interaction_constraints
72-
An optional list of integers specifying interaction constraints on model terms. For example, if there are three predictors in ***X***, then interaction_constraints = [1,0,2] means that 1) the first predictor in ***X*** cannot be used in interaction terms as a secondary effect, 2) there are no interaction constraints on terms using the second predictor in ***X***, and 3) the third predictor in ***X*** cannot be used in any interaction terms.
75+
An optional list containing lists of integers. Specifies interaction constraints on model terms. For example, interaction_constraints = [[0,1], [1,2,3]] means that 1) the first and second predictors may interact with each other, and that 2) the second, third and fourth predictors may interact with each other. There are no interaction constraints on predictors not mentioned in interaction_constraints.
7376

7477

7578
## Method: predict_class_probabilities(X:npt.ArrayLike, cap_predictions_to_minmax_in_training:bool=False)

API_REFERENCE_FOR_REGRESSION.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# APLRRegressor
22

3-
## class aplr.APLRRegressor(m:int=1000, v:float=0.1, random_state:int=0, loss_function:str="mse", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, dispersion_parameter:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5, calculate_custom_validation_error_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_loss_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_negative_gradient_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_transform_linear_predictor_to_predictions_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_differentiate_predictions_wrt_linear_predictor_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None)
3+
## class aplr.APLRRegressor(m:int=1000, v:float=0.1, random_state:int=0, loss_function:str="mse", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, dispersion_parameter:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5, calculate_custom_validation_error_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_loss_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_negative_gradient_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_transform_linear_predictor_to_predictions_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_differentiate_predictions_wrt_linear_predictor_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None, boosting_steps_before_pruning_is_done: int = 500)
44

55
### Constructor parameters
66

@@ -102,7 +102,10 @@ def calculate_custom_differentiate_predictions_wrt_linear_predictor(linear_predi
102102
return differentiated_predictions
103103
```
104104

105-
## Method: fit(X:npt.ArrayLike, y:npt.ArrayLike, sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[], prioritized_predictors_indexes:List[int]=[], monotonic_constraints:List[int]=[], group:npt.ArrayLike = np.empty(0), interaction_constraints:List[int]=[])
105+
#### boosting_steps_before_pruning_is_done (default = 500)
106+
Specifies how many boosting steps to wait before pruning the model. With the default value, this means that in boosting steps 500, 1000, and so on, the model will be pruned. When pruning, terms are removed as long as this reduces the training error. This can be a computationally costly operation especially if the model gets many terms. To switch off pruning set ***boosting_steps_before_pruning_is_done*** to a value higher than ***m***.
107+
108+
## Method: fit(X:npt.ArrayLike, y:npt.ArrayLike, sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[], prioritized_predictors_indexes:List[int]=[], monotonic_constraints:List[int]=[], group:npt.ArrayLike = np.empty(0), interaction_constraints:List[List[int]]=[])
106109

107110
***This method fits the model to data.***
108111

@@ -133,7 +136,7 @@ An optional list of integers specifying monotonic constraints on model terms. Fo
133136
A numpy vector of integers that is used when ***loss_function*** is "group_mse". For example, ***group*** may represent year (could be useful in a time series model).
134137

135138
#### interaction_constraints
136-
An optional list of integers specifying interaction constraints on model terms. For example, if there are three predictors in ***X***, then interaction_constraints = [1,0,2] means that 1) the first predictor in ***X*** cannot be used in interaction terms as a secondary effect, 2) there are no interaction constraints on terms using the second predictor in ***X***, and 3) the third predictor in ***X*** cannot be used in any interaction terms.
139+
An optional list containing lists of integers. Specifies interaction constraints on model terms. For example, interaction_constraints = [[0,1], [1,2,3]] means that 1) the first and second predictors may interact with each other, and that 2) the second, third and fourth predictors may interact with each other. There are no interaction constraints on predictors not mentioned in interaction_constraints.
137140

138141

139142
## Method: predict(X:npt.ArrayLike, cap_predictions_to_minmax_in_training:bool=True)

aplr/aplr.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ def __init__(
4343
calculate_custom_differentiate_predictions_wrt_linear_predictor_function: Optional[
4444
Callable[[npt.ArrayLike], npt.ArrayLike]
4545
] = None,
46+
boosting_steps_before_pruning_is_done: int = 500,
4647
):
4748
self.m = m
4849
self.v = v
@@ -74,6 +75,9 @@ def __init__(
7475
self.calculate_custom_differentiate_predictions_wrt_linear_predictor_function = (
7576
calculate_custom_differentiate_predictions_wrt_linear_predictor_function
7677
)
78+
self.boosting_steps_before_pruning_is_done = (
79+
boosting_steps_before_pruning_is_done
80+
)
7781

7882
# Creating aplr_cpp and setting parameters
7983
self.APLRRegressor = aplr_cpp.APLRRegressor()
@@ -115,6 +119,9 @@ def __set_params_cpp(self):
115119
self.APLRRegressor.calculate_custom_differentiate_predictions_wrt_linear_predictor_function = (
116120
self.calculate_custom_differentiate_predictions_wrt_linear_predictor_function
117121
)
122+
self.APLRRegressor.boosting_steps_before_pruning_is_done = (
123+
self.boosting_steps_before_pruning_is_done
124+
)
118125

119126
def fit(
120127
self,
@@ -126,7 +133,7 @@ def fit(
126133
prioritized_predictors_indexes: List[int] = [],
127134
monotonic_constraints: List[int] = [],
128135
group: npt.ArrayLike = np.empty(0),
129-
interaction_constraints: List[int] = [],
136+
interaction_constraints: List[List[int]] = [],
130137
):
131138
self.__set_params_cpp()
132139
self.APLRRegressor.fit(
@@ -219,6 +226,7 @@ def get_params(self, deep=True):
219226
"calculate_custom_negative_gradient_function": self.calculate_custom_negative_gradient_function,
220227
"calculate_custom_transform_linear_predictor_to_predictions_function": self.calculate_custom_transform_linear_predictor_to_predictions_function,
221228
"calculate_custom_differentiate_predictions_wrt_linear_predictor_function": self.calculate_custom_differentiate_predictions_wrt_linear_predictor_function,
229+
"boosting_steps_before_pruning_is_done": self.boosting_steps_before_pruning_is_done,
222230
}
223231

224232
# For sklearn
@@ -244,6 +252,7 @@ def __init__(
244252
min_observations_in_split: int = 20,
245253
ineligible_boosting_steps_added: int = 10,
246254
max_eligible_terms: int = 5,
255+
boosting_steps_before_pruning_is_done: int = 500,
247256
):
248257
self.m = m
249258
self.v = v
@@ -257,6 +266,9 @@ def __init__(
257266
self.min_observations_in_split = min_observations_in_split
258267
self.ineligible_boosting_steps_added = ineligible_boosting_steps_added
259268
self.max_eligible_terms = max_eligible_terms
269+
self.boosting_steps_before_pruning_is_done = (
270+
boosting_steps_before_pruning_is_done
271+
)
260272

261273
# Creating aplr_cpp and setting parameters
262274
self.APLRClassifier = aplr_cpp.APLRClassifier()
@@ -278,6 +290,9 @@ def __set_params_cpp(self):
278290
self.ineligible_boosting_steps_added
279291
)
280292
self.APLRClassifier.max_eligible_terms = self.max_eligible_terms
293+
self.APLRClassifier.boosting_steps_before_pruning_is_done = (
294+
self.boosting_steps_before_pruning_is_done
295+
)
281296

282297
def fit(
283298
self,
@@ -288,7 +303,7 @@ def fit(
288303
validation_set_indexes: List[int] = [],
289304
prioritized_predictors_indexes: List[int] = [],
290305
monotonic_constraints: List[int] = [],
291-
interaction_constraints: List[int] = [],
306+
interaction_constraints: List[List[int]] = [],
292307
):
293308
self.__set_params_cpp()
294309
self.APLRClassifier.fit(
@@ -350,6 +365,7 @@ def get_params(self, deep=True):
350365
"min_observations_in_split": self.min_observations_in_split,
351366
"ineligible_boosting_steps_added": self.ineligible_boosting_steps_added,
352367
"max_eligible_terms": self.max_eligible_terms,
368+
"boosting_steps_before_pruning_is_done": self.boosting_steps_before_pruning_is_done,
353369
}
354370

355371
# For sklearn

0 commit comments

Comments
 (0)