You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: API_REFERENCE.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ The learning rate. Must be greater than zero and not more than one. The higher t
14
14
Used to randomly split training observations into training and validation if ***validation_set_indexes*** is not specified when fitting.
15
15
16
16
#### family (default = "gaussian")
17
-
Determines the loss function used. Allowed values are "gaussian", "binomial", "poisson", "gamma"and "tweedie". This is used together with ***link_function***.
17
+
Determines the loss function used. Allowed values are "gaussian", "binomial", "poisson", "gamma", "tweedie" and "group_gaussian". This is used together with ***link_function***. When ***family*** is "group_gaussian" then the "group" argument in the ***fit*** method must be provided. In the latter case APLR will try to minimize group MSE when training the model.
18
18
19
19
#### link_function (default = "identity")
20
20
Determines how the linear predictor is transformed to predictions. Allowed values are "identity", "logit" and "log". For an ordinary regression model use ***family*** "gaussian" and ***link_function*** "identity". For logistic regression use ***family*** "binomial" and ***link_function*** "logit". For a multiplicative model use the "log" ***link_function***. The "log" ***link_function*** often works best with a "poisson", "gamma" or "tweedie" ***family***, depending on the data. The ***family*** "poisson", "gamma" or "tweedie" should only be used with the "log" ***link_function***. Inappropriate combinations of ***family*** and ***link_function*** may result in a warning message when fitting the model and/or a poor model fit. Please note that values other than "identity" typically require a significantly higher ***m*** (or ***v***) in order to converge.
@@ -55,7 +55,7 @@ Specifies the variance power for the "tweedie" ***family***.
Specifies which metric to use for validating the model and tuning ***m***. Available options are "default" (using the same methodology as when calculating the training error), "mse", "mae", "negative_gini" and "rankability". The default is often a choice that fits well with respect to the ***family*** chosen. However, if you want to use ***family*** or ***tweedie_power*** as tuning parameters then the default is not suitable. "rankability" uses a methodology similar to the one described in https://towardsdatascience.com/how-to-calculate-roc-auc-score-for-regression-models-c0be4fdf76bb except that the metric is inverted and can be weighted by sample weights.
@@ -82,6 +82,9 @@ An optional list of integers specifying the indexes of predictors (columns) in *
82
82
#### monotonic_constraints
83
83
An optional list of integers specifying monotonic constraints on model terms. For example, if there are three predictors in ***X***, then monotonic_constraints = [1,0,-1] means that 1) the first predictor in ***X*** cannot be used in interaction terms and all terms using the first predictor in ***X*** as a main effect must have positive regression coefficients, 2) there are no monotonic constraints on terms using the second predictor in ***X***, and 3) the third predictor in ***X*** cannot be used in interaction terms and all terms using the third predictor in ***X*** as a main effect must have negative regression coefficients.
84
84
85
+
#### group
86
+
A numpy vector of integers that is used when ***family*** is "group_gaussian". For example, ***group*** may represent year (could be useful in a time series model).
0 commit comments