You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The regularization level is controlled by the hyperparameter :math:`\lambda\in\mathbb{R}^+`, that is defined and initialized in the constructor of the class.
28
+
The regularization level is controlled by the hyperparameter :math:`\lambda\inbb(R)^+`, that is defined and initialized in the constructor of the class.
29
29
30
30
The method ``get_spec`` allows to strongly type the attributes of the penalty object, thus allowing Numba to JIT-compile the class.
31
-
It should return an iterable of tuples, the first element being the name of the attribute, the second its Numba type (e.g. `float64`, `bool_`).
31
+
It should return an iterable of tuples, the first element being the name of the attribute, the second its Numba type (e.g. ``float64``, ``bool_``).
32
32
Additionally, a penalty should implement ``params_to_dict``, a helper method to get all the parameters of a penalty returned in a dictionary.
33
33
34
34
To optimize an objective with a given penalty, skglm needs at least the proximal operator of the penalty applied to the :math:`j`-th coordinate.
35
35
For the ``L1`` penalty, it is the well-known soft-thresholding operator:
This note gives insights and guidance for the handling of an intercept coefficient within the $\texttt{skglm}$ solvers.
1
+
This note gives insights and guidance for the handling of an intercept coefficient within the `skglm` solvers.
2
2
3
-
Let the design matrix be $X\in \mathbb{R}^{n\times p}$ where $n$ is the number of samples and $p$ the number of features.
4
-
We denote $\beta\in\mathbb{R}^p$ the coefficients of the Generalized Linear Model and $\beta_0$ its intercept.
5
-
In many packages such as `liblinear`, the intercept is handled by adding an extra column of ones in the design matrix. This is costly in memory, and may lead to different solutions if all coefficients are penalized, as the intercept $\beta_0$ is usually not.
3
+
Let the design matrix be $Xin RR^{ntimes p}$ where $n$ is the number of samples and $p$ the number of features.
4
+
We denote $beta in RR^p$ the coefficients of the Generalized Linear Model and $beta_0$ its intercept.
5
+
In many packages such as `liblinear`, the intercept is handled by adding an extra column of ones in the design matrix. This is costly in memory, and may lead to different solutions if all coefficients are penalized, as the intercept $beta_0$ is usually not.
6
6
`skglm` follows a different route and solves directly:
\forall x, x_0in RR^p times RR, \forall h in RR, |nabla_(x_0) f(x, x_0 + h) - nabla_(x_0) f(x, x_0)| <= L_0 |h| \ .
35
35
$$
36
36
37
37
This update rule should be implemented in the `intercept_update_step` method of the datafit class.
38
38
39
-
The convergence criterion computed for the gradient is then only the absolute value of the gradient with respect to $\beta_0$ since the intercept optimality condition, for a solution $\beta^\star$, $\beta_0^\star$ is:
39
+
The convergence criterion computed for the gradient is then only the absolute value of the gradient with respect to $beta_0$ since the intercept optimality condition, for a solution $beta^star$, $beta_0^star$ is:
0 commit comments