Skip to content

Commit 8a6fd66

Browse files
added new loss_function and validation_tuning_metric
1 parent e05d632 commit 8a6fd66

File tree

7 files changed

+370
-39
lines changed

7 files changed

+370
-39
lines changed

API_REFERENCE_FOR_REGRESSION.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# APLRRegressor
22

3-
## class aplr.APLRRegressor(m:int=1000, v:float=0.1, random_state:int=0, loss_function:str="mse", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, dispersion_parameter:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5, calculate_custom_validation_error_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_loss_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_negative_gradient_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_transform_linear_predictor_to_predictions_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_differentiate_predictions_wrt_linear_predictor_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None, boosting_steps_before_pruning_is_done: int = 0, boosting_steps_before_interactions_are_allowed: int = 0, monotonic_constraints_ignore_interactions: bool = False)
3+
## class aplr.APLRRegressor(m:int=1000, v:float=0.1, random_state:int=0, loss_function:str="mse", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, dispersion_parameter:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5, calculate_custom_validation_error_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_loss_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], float]]=None, calculate_custom_negative_gradient_function:Optional[Callable[[npt.ArrayLike, npt.ArrayLike, npt.ArrayLike, npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_transform_linear_predictor_to_predictions_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None, calculate_custom_differentiate_predictions_wrt_linear_predictor_function:Optional[Callable[[npt.ArrayLike], npt.ArrayLike]]=None, boosting_steps_before_pruning_is_done: int = 0, boosting_steps_before_interactions_are_allowed: int = 0, monotonic_constraints_ignore_interactions: bool = False, group_mse_cycle_bins: int = 10)
44

55
### Constructor parameters
66

@@ -14,7 +14,7 @@ The learning rate. Must be greater than zero and not more than one. The higher t
1414
Used to randomly split training observations into training and validation if ***validation_set_indexes*** is not specified when fitting.
1515

1616
#### loss_function (default = "mse")
17-
Determines the loss function used. Allowed values are "mse", "binomial", "poisson", "gamma", "tweedie", "group_mse", "mae", "quantile", "negative_binomial", "cauchy", "weibull" and "custom_function". This is used together with ***link_function***. When ***loss_function*** is "group_mse" then the "group" argument in the ***fit*** method must be provided. In the latter case APLR will try to minimize group MSE when training the model. The ***loss_function*** "quantile" is used together with the ***quantile*** constructor parameter. When ***loss_function*** is "custom_function" then the constructor parameters ***calculate_custom_loss_function*** and ***calculate_custom_negative_gradient_function***, both described below, must be provided.
17+
Determines the loss function used. Allowed values are "mse", "binomial", "poisson", "gamma", "tweedie", "group_mse", "group_mse_cycle","mae", "quantile", "negative_binomial", "cauchy", "weibull" and "custom_function". This is used together with ***link_function***. When ***loss_function*** is "group_mse" then the "group" argument in the ***fit*** method must be provided. In the latter case APLR will try to minimize group MSE when training the model. When using "group_mse_cycle", ***group_mse_cycle_bins*** controls the maximum amount of groups. For a description of "group_mse_cycle" see ***group_mse_cycle_bins***. The ***loss_function*** "quantile" is used together with the ***quantile*** constructor parameter. When ***loss_function*** is "custom_function" then the constructor parameters ***calculate_custom_loss_function*** and ***calculate_custom_negative_gradient_function***, both described below, must be provided.
1818

1919
#### link_function (default = "identity")
2020
Determines how the linear predictor is transformed to predictions. Allowed values are "identity", "logit", "log" and "custom_function". For an ordinary regression model use ***loss_function*** "mse" and ***link_function*** "identity". For logistic regression use ***loss_function*** "binomial" and ***link_function*** "logit". For a multiplicative model use the "log" ***link_function***. The "log" ***link_function*** often works best with a "poisson", "gamma", "tweedie", "negative_binomial" or "weibull" ***loss_function***, depending on the data. The ***loss_function*** "poisson", "gamma", "tweedie", "negative_binomial" or "weibull" should only be used with the "log" ***link_function***. Inappropriate combinations of ***loss_function*** and ***link_function*** may result in a warning message when fitting the model and/or a poor model fit. Please note that values other than "identity" typically require a significantly higher ***m*** (or ***v***) in order to converge. When ***link_function*** is "custom_function" then the constructor parameters ***calculate_custom_transform_linear_predictor_to_predictions_function*** and ***calculate_custom_differentiate_predictions_wrt_linear_predictor_function***, both described below, must be provided.
@@ -50,7 +50,7 @@ Limits 1) the number of terms already in the model that can be considered as int
5050
Specifies the variance power when ***loss_function*** is "tweedie". Specifies a dispersion parameter when ***loss_function*** is "negative_binomial", "cauchy" or "weibull".
5151

5252
#### validation_tuning_metric (default = "default")
53-
Specifies which metric to use for validating the model and tuning ***m***. Available options are "default" (using the same methodology as when calculating the training error), "mse", "mae", "negative_gini", "rankability", "group_mse" and "custom_function". The default is often a choice that fits well with respect to the ***loss_function*** chosen. However, if you want to use ***loss_function*** or ***dispersion_parameter*** as tuning parameters then the default is not suitable. "rankability" uses a methodology similar to the one described in https://towardsdatascience.com/how-to-calculate-roc-auc-score-for-regression-models-c0be4fdf76bb except that the metric is inverted and can be weighted by sample weights. "group_mse" requires that the "group" argument in the ***fit*** method is provided. For "custom_function" see ***calculate_custom_validation_error_function*** below.
53+
Specifies which metric to use for validating the model and tuning ***m***. Available options are "default" (using the same methodology as when calculating the training error), "mse", "mae", "negative_gini", "rankability", "group_mse", "group_mse_by_prediction" and "custom_function". The default is often a choice that fits well with respect to the ***loss_function*** chosen. However, if you want to use ***loss_function*** or ***dispersion_parameter*** as tuning parameters then the default is not suitable. "rankability" uses a methodology similar to the one described in https://towardsdatascience.com/how-to-calculate-roc-auc-score-for-regression-models-c0be4fdf76bb except that the metric is inverted and can be weighted by sample weights. "group_mse" requires that the "group" argument in the ***fit*** method is provided. "group_mse_by_prediction" groups predictions by up to ***group_mse_cycle_bins*** groups and calculates groupwise mse. For "custom_function" see ***calculate_custom_validation_error_function*** below.
5454

5555
#### quantile (default = 0.5)
5656
Specifies the quantile to use when ***loss_function*** is "quantile".
@@ -111,6 +111,10 @@ Specifies how many boosting steps to wait before searching for interactions. If
111111
#### monotonic_constraints_ignore_interactions (default = False)
112112
See ***monotonic_constraints*** in the ***fit*** method.
113113

114+
#### group_mse_cycle_bins (default = 10)
115+
When ***loss_function*** equals ***group_mse_cycle*** then ***group_mse_cycle_bins*** specifies the maximum amount of groups. The loss function ***group_mse_cycle*** groups by the first predictor in ***X*** in the first boosting step, then by the second predictor in ***X*** in the second boosting step, etc. So in each boosting step the predictor to group by is changed. If ***validation_tuning_metric*** is "default" then "group_mse_by_prediction" will be used as ***validation_tuning_metric***.
116+
117+
114118
## Method: fit(X:npt.ArrayLike, y:npt.ArrayLike, sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[], prioritized_predictors_indexes:List[int]=[], monotonic_constraints:List[int]=[], group:npt.ArrayLike = np.empty(0), interaction_constraints:List[List[int]]=[], other_data: npt.ArrayLike = np.empty([0, 0]))
115119

116120
***This method fits the model to data.***

aplr/aplr.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ def __init__(
6363
boosting_steps_before_pruning_is_done: int = 0,
6464
boosting_steps_before_interactions_are_allowed: int = 0,
6565
monotonic_constraints_ignore_interactions: bool = False,
66+
group_mse_cycle_bins: int = 10,
6667
):
6768
self.m = m
6869
self.v = v
@@ -103,6 +104,7 @@ def __init__(
103104
self.monotonic_constraints_ignore_interactions = (
104105
monotonic_constraints_ignore_interactions
105106
)
107+
self.group_mse_cycle_bins = group_mse_cycle_bins
106108

107109
# Creating aplr_cpp and setting parameters
108110
self.APLRRegressor = aplr_cpp.APLRRegressor()
@@ -153,6 +155,7 @@ def __set_params_cpp(self):
153155
self.APLRRegressor.monotonic_constraints_ignore_interactions = (
154156
self.monotonic_constraints_ignore_interactions
155157
)
158+
self.APLRRegressor.group_mse_cycle_bins = self.group_mse_cycle_bins
156159

157160
def fit(
158161
self,
@@ -267,6 +270,7 @@ def get_params(self, deep=True):
267270
"boosting_steps_before_pruning_is_done": self.boosting_steps_before_pruning_is_done,
268271
"boosting_steps_before_interactions_are_allowed": self.boosting_steps_before_interactions_are_allowed,
269272
"monotonic_constraints_ignore_interactions": self.monotonic_constraints_ignore_interactions,
273+
"group_mse_cycle_bins": self.group_mse_cycle_bins,
270274
}
271275

272276
# For sklearn

0 commit comments

Comments
 (0)