Skip to content

Commit 544d694

Browse files
Updated API reference and code example
1 parent 0cda913 commit 544d694

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

API_REFERENCE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ The learning rate. Must be greater than zero and not more than one. The higher t
1414
Used to randomly split training observations into training and validation if ***validation_set_indexes*** is not specified when fitting.
1515

1616
#### family (default = "gaussian")
17-
Determines the loss function used. Allowed values are "gaussian", "binomial", "poisson", "gamma" and "tweedie". This is used together with ***link_function***. ***family*** is not intended to be a tuning parameter because it defines how the loss function is calculated. However, if you wish to tune it then the method ***get_validation_group_mse()*** provides a useful tuning metric.
17+
Determines the loss function used. Allowed values are "gaussian", "binomial", "poisson", "gamma" and "tweedie". This is used together with ***link_function***.
1818

1919
#### link_function (default = "identity")
20-
Determines how the linear predictor is transformed to predictions. Allowed values are "identity", "logit" and "log". For an ordinary regression model use ***family*** "gaussian" and ***link_function*** "identity". For logistic regression use ***family*** "binomial" and ***link_function*** "logit". For a multiplicative model use the "log" ***link_function***. The "log" ***link_function*** often works best with a "poisson", "gamma" or "tweedie" ***family***, depending on the data. The ***family*** "poisson", "gamma" or "tweedie" should only be used with the "log" ***link_function***. Inappropriate combinations of ***family*** and ***link_function*** may result in a warning message when fitting the model and/or a poor model fit. Please note that values other than "identity" typically require a significantly higher ***m*** (or ***v***) in order to converge. ***link_function*** is not intended to be a tuning parameter because it defines the model structure. However, if you wish to tune it then the method ***get_validation_group_mse()*** provides a useful tuning metric.
20+
Determines how the linear predictor is transformed to predictions. Allowed values are "identity", "logit" and "log". For an ordinary regression model use ***family*** "gaussian" and ***link_function*** "identity". For logistic regression use ***family*** "binomial" and ***link_function*** "logit". For a multiplicative model use the "log" ***link_function***. The "log" ***link_function*** often works best with a "poisson", "gamma" or "tweedie" ***family***, depending on the data. The ***family*** "poisson", "gamma" or "tweedie" should only be used with the "log" ***link_function***. Inappropriate combinations of ***family*** and ***link_function*** may result in a warning message when fitting the model and/or a poor model fit. Please note that values other than "identity" typically require a significantly higher ***m*** (or ***v***) in order to converge.
2121

2222
#### n_jobs (default = 0)
2323
Multi-threading parameter. If ***0*** then uses all available cores for multi-threading. Any other positive integer specifies the number of cores to use (***1*** means single-threading).
@@ -50,10 +50,10 @@ Limits 1) the number of terms already in the model that can be considered as int
5050
***0*** does not print progress reports during fitting. ***1*** prints a summary after running the ***fit*** method. ***2*** prints a summary after each boosting step.
5151

5252
#### tweedie_power (default = 1.5)
53-
Species the variance power for the "tweedie" ***family***. It can be useful to tune this hyperparameter. The method ***get_validation_group_mse()*** provides a tuning metric that ***tweedie_power*** can be tuned on.
53+
Species the variance power for the "tweedie" ***family***. It can be useful to tune this hyperparameter. The method ***get_validation_group_mse()*** provides an experimental tuning metric for this.
5454

5555
#### group_size_for_validation_group_mse (default = 100)
56-
APLR calculates mean squared error on grouped data in the validation set. This can be useful for comparing models that have different ***family*** or ***tweedie_power*** parameters. The maximum number of observations in each group is specified by ***group_size_for_validation_group_mse***. Some of the observations with the lowest or highest response values will belong to groups with less than ***group_size_for_validation_group_mse*** observations. The minimum number of observations in a group is ***group_size_for_validation_group_mse/2***. If ***group_size_for_validation_group_mse*** is equal to or higher than the number of observations in the validation set, then there will only be one group (in this case the grouped validation MSE is not so useful). ***group_size_for_validation_group_mse*** should be large enough so that the Central Limit Theorem holds (at least 60, but 100 is a safer choice). Also, the number of observations in the validation set should be substantially higher than ***group_size_for_validation_group_mse*** for group validation MSE to be useful.
56+
APLR calculates an experimental tuning metric, mean squared error on grouped data in the validation set. This may be useful for tuning ***tweedie_power***. The maximum number of observations in a group is specified by ***group_size_for_validation_group_mse***. The minimum number of observations in a group is approximately half of that. If ***group_size_for_validation_group_mse*** is equal to or higher than the number of observations in the validation set, then there will only be one group (in this case the grouped validation MSE is less useful). ***group_size_for_validation_group_mse*** should be large enough so that the Central Limit Theorem holds (at least 60, but 100 is a safer choice). Also, the number of observations in the validation set should be substantially higher than ***group_size_for_validation_group_mse***.
5757

5858

5959
## Method: fit(X:npt.ArrayLike, y:npt.ArrayLike, sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[])

examples/train_aplr_validation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
model = APLRRegressor(random_state=random_state,verbosity=3,m=1000,v=0.1,family=family,link_function=link_function,**params)
3939
model.fit(data_train[predictors].values,data_train[response].values,X_names=predictors)
4040
validation_error_for_this_model=np.min(model.get_validation_error_steps())
41-
#validation_error_for_this_model=model.get_validation_group_mse() #Use this if you wish to tune tweedie_power, family or link_function.
41+
#validation_error_for_this_model=model.get_validation_group_mse() #You may try this experimental metric to tune tweedie_power
4242
validation_results_for_this_model=pd.DataFrame(model.get_params(),index=[0])
4343
validation_results_for_this_model["validation_error"]=validation_error_for_this_model
4444
validation_results=pd.concat([validation_results,validation_results_for_this_model])

0 commit comments

Comments
 (0)