Skip to content

Commit 3460c31

Browse files
documentation
1 parent b8068ae commit 3460c31

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

API_REFERENCE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,10 @@ Limits 1) the number of terms already in the model that can be considered as int
5050
***0*** does not print progress reports during fitting. ***1*** prints a summary after running the ***fit*** method. ***2*** prints a summary after each boosting step.
5151

5252
#### tweedie_power (default = 1.5)
53-
Species the variance power for the "tweedie" ***family***. It can be useful to tune this hyperparameter. The method ***get_validation_group_mse()*** provides an experimental tuning metric for this.
53+
Species the variance power for the "tweedie" ***family***.
5454

5555
#### group_size_for_validation_group_mse (default = 100)
56-
APLR calculates an experimental tuning metric, mean squared error on grouped data in the validation set. This may be useful for tuning ***tweedie_power***. The maximum number of observations in a group is specified by ***group_size_for_validation_group_mse***. The minimum number of observations in a group is approximately half of that. If ***group_size_for_validation_group_mse*** is equal to or higher than the number of observations in the validation set, then there will only be one group (in this case the grouped validation MSE is less useful). ***group_size_for_validation_group_mse*** should be large enough so that the Central Limit Theorem holds (at least 60, but 100 is a safer choice). Also, the number of observations in the validation set should be substantially higher than ***group_size_for_validation_group_mse***.
56+
APLR calculates a tuning metric, mean squared error for groups of observations in the validation set. This metric is provided by the method ***get_validation_group_mse()***. The metric may be useful for tuning ***tweedie_power*** and to some extent ***family*** or ***link_function***. The reasoning behind this is that while mean squared error (MSE) could be inappropriate for evaluating for example tweedie distributed responses, MSE is often appropriate for evaluating normally distributed data. The sum response of a group of observations is approximately normally distributed according to the Central Limit Theorem (CLT) if there are enough observations in the group, even if the response for an individual observation has a different probability distribution. Ideally, ***group_size_for_validation_group_mse*** should be large enough so that the Central Limit Theorem holds (at least 30, but the default of 100 is a safer choice). Also, the number of observations in the validation set should be substantially higher than ***group_size_for_validation_group_mse***.
5757

5858

5959
## Method: fit(X:npt.ArrayLike, y:npt.ArrayLike, sample_weight:npt.ArrayLike = np.empty(0), X_names:List[str]=[], validation_set_indexes:List[int]=[])

examples/train_aplr_validation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
model = APLRRegressor(random_state=random_state,verbosity=3,m=1000,v=0.1,family=family,link_function=link_function,**params)
3939
model.fit(data_train[predictors].values,data_train[response].values,X_names=predictors)
4040
validation_error_for_this_model=np.min(model.get_validation_error_steps())
41-
#validation_error_for_this_model=model.get_validation_group_mse() #Experimental metric for tuning tweedie_power
41+
#validation_error_for_this_model=model.get_validation_group_mse() #Metric that may be useful for tuning tweedie_power, family or link_function.
4242
validation_results_for_this_model=pd.DataFrame(model.get_params(),index=[0])
4343
validation_results_for_this_model["validation_error"]=validation_error_for_this_model
4444
validation_results=pd.concat([validation_results,validation_results_for_this_model])

0 commit comments

Comments
 (0)