Skip to content

Commit 3dc124c

Browse files
added new functionality and renamed parameters
1 parent 4cb2140 commit 3dc124c

30 files changed

+370
-138
lines changed

API_REFERENCE.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# APLRRegressor
22

3-
## class aplr.APLRRegressor(m:int=1000, v:float=0.1, random_state:int=0, family:str="gaussian", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, intercept:float=np.nan, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, tweedie_power:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5)
3+
## class aplr.APLRRegressor(m:int=1000, v:float=0.1, random_state:int=0, loss_function:str="mse", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, intercept:float=np.nan, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, dispersion_parameter:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5)
44

55
### Constructor parameters
66

@@ -13,11 +13,11 @@ The learning rate. Must be greater than zero and not more than one. The higher t
1313
#### random_state (default = 0)
1414
Used to randomly split training observations into training and validation if ***validation_set_indexes*** is not specified when fitting.
1515

16-
#### family (default = "gaussian")
17-
Determines the loss function used. Allowed values are "gaussian", "binomial", "poisson", "gamma", "tweedie", "group_gaussian", "mae" and "quantile". This is used together with ***link_function***. When ***family*** is "group_gaussian" then the "group" argument in the ***fit*** method must be provided. In the latter case APLR will try to minimize group MSE when training the model. The ***family*** "quantile" is used together with the ***quantile*** constructor parameter.
16+
#### loss_function (default = "mse")
17+
Determines the loss function used. Allowed values are "mse", "binomial", "poisson", "gamma", "tweedie", "group_mse", "mae", "quantile", "negative_binomial" and "cauchy". This is used together with ***link_function***. When ***loss_function*** is "group_mse" then the "group" argument in the ***fit*** method must be provided. In the latter case APLR will try to minimize group MSE when training the model. The ***loss_function*** "quantile" is used together with the ***quantile*** constructor parameter.
1818

1919
#### link_function (default = "identity")
20-
Determines how the linear predictor is transformed to predictions. Allowed values are "identity", "logit" and "log". For an ordinary regression model use ***family*** "gaussian" and ***link_function*** "identity". For logistic regression use ***family*** "binomial" and ***link_function*** "logit". For a multiplicative model use the "log" ***link_function***. The "log" ***link_function*** often works best with a "poisson", "gamma" or "tweedie" ***family***, depending on the data. The ***family*** "poisson", "gamma" or "tweedie" should only be used with the "log" ***link_function***. Inappropriate combinations of ***family*** and ***link_function*** may result in a warning message when fitting the model and/or a poor model fit. Please note that values other than "identity" typically require a significantly higher ***m*** (or ***v***) in order to converge.
20+
Determines how the linear predictor is transformed to predictions. Allowed values are "identity", "logit" and "log". For an ordinary regression model use ***loss_function*** "mse" and ***link_function*** "identity". For logistic regression use ***loss_function*** "binomial" and ***link_function*** "logit". For a multiplicative model use the "log" ***link_function***. The "log" ***link_function*** often works best with a "poisson", "gamma", "tweedie" or "negative_binomial" ***loss_function***, depending on the data. The ***loss_function*** "poisson", "gamma", "tweedie" or "negative_binomial" should only be used with the "log" ***link_function***. Inappropriate combinations of ***loss_function*** and ***link_function*** may result in a warning message when fitting the model and/or a poor model fit. Please note that values other than "identity" typically require a significantly higher ***m*** (or ***v***) in order to converge.
2121

2222
#### n_jobs (default = 0)
2323
Multi-threading parameter. If ***0*** then uses all available cores for multi-threading. Any other positive integer specifies the number of cores to use (***1*** means single-threading).
@@ -49,14 +49,14 @@ Limits 1) the number of terms already in the model that can be considered as int
4949
#### verbosity (default = 0)
5050
***0*** does not print progress reports during fitting. ***1*** prints a summary after running the ***fit*** method. ***2*** prints a summary after each boosting step.
5151

52-
#### tweedie_power (default = 1.5)
53-
Specifies the variance power for the "tweedie" ***family***.
52+
#### dispersion_parameter (default = 1.5)
53+
Specifies the variance power when ***loss_function*** is "tweedie". Specifies a dispersion parameter when ***loss_function*** is "negative_binomial" or "cauchy".
5454

5555
#### validation_tuning_metric (default = "default")
56-
Specifies which metric to use for validating the model and tuning ***m***. Available options are "default" (using the same methodology as when calculating the training error), "mse", "mae", "negative_gini" and "rankability". The default is often a choice that fits well with respect to the ***family*** chosen. However, if you want to use ***family*** or ***tweedie_power*** as tuning parameters then the default is not suitable. "rankability" uses a methodology similar to the one described in https://towardsdatascience.com/how-to-calculate-roc-auc-score-for-regression-models-c0be4fdf76bb except that the metric is inverted and can be weighted by sample weights.
56+
Specifies which metric to use for validating the model and tuning ***m***. Available options are "default" (using the same methodology as when calculating the training error), "mse", "mae", "negative_gini", "rankability" and "group_mse". The default is often a choice that fits well with respect to the ***loss_function*** chosen. However, if you want to use ***loss_function*** or ***dispersion_parameter*** as tuning parameters then the default is not suitable. "rankability" uses a methodology similar to the one described in https://towardsdatascience.com/how-to-calculate-roc-auc-score-for-regression-models-c0be4fdf76bb except that the metric is inverted and can be weighted by sample weights. "group_mse" requires that the "group" argument in the ***fit*** method is provided.
5757

5858
#### quantile (default = 0.5)
59-
Specifies the quantile to use when ***family*** is "quantile".
59+
Specifies the quantile to use when ***loss_function*** is "quantile".
6060

6161

6262
## Method: fit(X:npt.ArrayLike, y:npt.ArrayLike, sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[], prioritized_predictors_indexes:List[int]=[], monotonic_constraints:List[int]=[], group:npt.ArrayLike = np.empty(0), interaction_constraints:List[int]=[])
@@ -87,7 +87,7 @@ An optional list of integers specifying the indexes of predictors (columns) in *
8787
An optional list of integers specifying monotonic constraints on model terms. For example, if there are three predictors in ***X***, then monotonic_constraints = [1,0,-1] means that 1) the first predictor in ***X*** cannot be used in interaction terms as a secondary effect and all terms using the first predictor in ***X*** as a main effect must have positive regression coefficients, 2) there are no monotonic constraints on terms using the second predictor in ***X***, and 3) the third predictor in ***X*** cannot be used in interaction terms as a secondary effect and all terms using the third predictor in ***X*** as a main effect must have negative regression coefficients.
8888

8989
#### group
90-
A numpy vector of integers that is used when ***family*** is "group_gaussian". For example, ***group*** may represent year (could be useful in a time series model).
90+
A numpy vector of integers that is used when ***loss_function*** is "group_mse". For example, ***group*** may represent year (could be useful in a time series model).
9191

9292
#### interaction_constraints
9393
An optional list of integers specifying interaction constraints on model terms. For example, if there are three predictors in ***X***, then interaction_constraints = [1,0,2] means that 1) the first predictor in ***X*** cannot be used in interaction terms as a secondary effect, 2) there are no interaction constraints on terms using the second predictor in ***X***, and 3) the third predictor in ***X*** cannot be used in any interaction terms.

aplr/aplr.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55

66

77
class APLRRegressor():
8-
def __init__(self, m:int=1000, v:float=0.1, random_state:int=0, family:str="gaussian", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, intercept:float=np.nan, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, tweedie_power:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5):
8+
def __init__(self, m:int=1000, v:float=0.1, random_state:int=0, loss_function:str="mse", link_function:str="identity", n_jobs:int=0, validation_ratio:float=0.2, intercept:float=np.nan, bins:int=300, max_interaction_level:int=1, max_interactions:int=100000, min_observations_in_split:int=20, ineligible_boosting_steps_added:int=10, max_eligible_terms:int=5, verbosity:int=0, dispersion_parameter:float=1.5, validation_tuning_metric:str="default", quantile:float=0.5):
99
self.m=m
1010
self.v=v
1111
self.random_state=random_state
12-
self.family=family
12+
self.loss_function=loss_function
1313
self.link_function=link_function
1414
self.n_jobs=n_jobs
1515
self.validation_ratio=validation_ratio
@@ -21,7 +21,7 @@ def __init__(self, m:int=1000, v:float=0.1, random_state:int=0, family:str="gaus
2121
self.ineligible_boosting_steps_added=ineligible_boosting_steps_added
2222
self.max_eligible_terms=max_eligible_terms
2323
self.verbosity=verbosity
24-
self.tweedie_power=tweedie_power
24+
self.dispersion_parameter=dispersion_parameter
2525
self.validation_tuning_metric=validation_tuning_metric
2626
self.quantile=quantile
2727

@@ -34,7 +34,7 @@ def __set_params_cpp(self):
3434
self.APLRRegressor.m=self.m
3535
self.APLRRegressor.v=self.v
3636
self.APLRRegressor.random_state=self.random_state
37-
self.APLRRegressor.family=self.family
37+
self.APLRRegressor.loss_function=self.loss_function
3838
self.APLRRegressor.link_function=self.link_function
3939
self.APLRRegressor.n_jobs=self.n_jobs
4040
self.APLRRegressor.validation_ratio=self.validation_ratio
@@ -46,7 +46,7 @@ def __set_params_cpp(self):
4646
self.APLRRegressor.ineligible_boosting_steps_added=self.ineligible_boosting_steps_added
4747
self.APLRRegressor.max_eligible_terms=self.max_eligible_terms
4848
self.APLRRegressor.verbosity=self.verbosity
49-
self.APLRRegressor.tweedie_power=self.tweedie_power
49+
self.APLRRegressor.dispersion_parameter=self.dispersion_parameter
5050
self.APLRRegressor.validation_tuning_metric=self.validation_tuning_metric
5151
self.APLRRegressor.quantile=self.quantile
5252

@@ -105,7 +105,7 @@ def get_params(self, deep=True):
105105
"m": self.m,
106106
"v": self.v,
107107
"random_state":self.random_state,
108-
"family":self.family,
108+
"loss_function":self.loss_function,
109109
"link_function":self.link_function,
110110
"n_jobs":self.n_jobs,
111111
"validation_ratio":self.validation_ratio,
@@ -117,7 +117,7 @@ def get_params(self, deep=True):
117117
"min_observations_in_split":self.min_observations_in_split,
118118
"ineligible_boosting_steps_added":self.ineligible_boosting_steps_added,
119119
"max_eligible_terms":self.max_eligible_terms,
120-
"tweedie_power":self.tweedie_power,
120+
"dispersion_parameter":self.dispersion_parameter,
121121
"validation_tuning_metric":self.validation_tuning_metric,
122122
"quantile":self.quantile
123123
}

0 commit comments

Comments
 (0)