You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -14,7 +14,7 @@ The learning rate. Must be greater than zero and not more than one. The higher t
14
14
Used to randomly split training observations into training and validation if ***validation_set_indexes*** is not specified when fitting.
15
15
16
16
#### family (default = "gaussian")
17
-
Determines the loss function used. Allowed values are "gaussian", "binomial", "poisson", "gamma" and "tweedie". This is used together with ***link_function***. Please note that this is not a tuning parameter because it defines how the loss function is calculated.
17
+
Determines the loss function used. Allowed values are "gaussian", "binomial", "poisson", "gamma" and "tweedie". This is used together with ***link_function***. Please note that this is not a tuning parameter because it defines how the loss function is calculated. However, it can be tuned with ***get_validation_group_mse()*** as the tuning metric.
18
18
19
19
#### link_function (default = "identity")
20
20
Determines how the linear predictor is transformed to predictions. Allowed values are "identity", "logit" and "log". For an ordinary regression model use ***family*** "gaussian" and ***link_function*** "identity". For logistic regression use ***family*** "binomial" and ***link_function*** "logit". For a multiplicative model use the "log" ***link_function***. The "log" ***link_function*** often works best with a "poisson", "gamma" or "tweedie" ***family***, depending on the data. The ***family*** "poisson", "gamma" or "tweedie" should only be used with the "log" ***link_function***. Inappropriate combinations of ***family*** and ***link_function*** may result in a warning message when fitting the model and/or a poor model fit. Please note that values other than "identity" typically require a significantly higher ***m*** (or ***v***) in order to converge.
@@ -50,7 +50,10 @@ Limits 1) the number of terms already in the model that can be considered as int
50
50
***0*** does not print progress reports during fitting. ***1*** prints a summary after running the ***fit*** method. ***2*** prints a summary after each boosting step.
51
51
52
52
#### tweedie_power (default = 1.5)
53
-
Species the variance power for the "tweedie" ***family*** and ***link_function***. Please note that this is not a tuning parameter because it defines how the loss function is calculated.
53
+
Species the variance power for the "tweedie" ***family*** and ***link_function***. Please note that this is not a tuning parameter because it defines how the loss function is calculated. However, it can be tuned with ***get_validation_group_mse()*** as the tuning metric.
APLR calculates mean squared error on grouped data in the validation set. This can be useful for comparing models that have different ***family*** or ***tweedie_power*** parameters. The maximum number of observations in each group is specified by ***group_size_for_validation_group_mse***. Some of the observations with the lowest or highest response values will belong to groups with less than ***group_size_for_validation_group_mse*** observations. The minimum number of observations in a group is ***group_size_for_validation_group_mse/2***. If ***group_size_for_validation_group_mse*** is equal to or higher than the number of observations in the validation set, then there will only be one group (in this case the grouped validation MSE is not so useful). ***group_size_for_validation_group_mse*** should be large enough so that the Central Limit Theorem holds (at least 60, but 100 is a safer choice). Also, the number of observations in the validation set should be substantially higher than ***group_size_for_validation_group_mse*** for group validation MSE to be useful.
0 commit comments