Skip to content

Commit ef88450

Browse files
tomaszkacprzakmathurinmBadr-MOUFAD
authored
ENH add GroupLasso estimator with sparse X support (#228)
Co-authored-by: mathurinm <[email protected]> Co-authored-by: Badr-MOUFAD <[email protected]>
1 parent e4650ec commit ef88450

File tree

12 files changed

+465
-69
lines changed

12 files changed

+465
-69
lines changed

doc/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Estimators
1717
GeneralizedLinearEstimator
1818
CoxEstimator
1919
ElasticNet
20+
GroupLasso
2021
Lasso
2122
LinearSVC
2223
SparseLogisticRegression

doc/changes/0.4.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
Version 0.4 (in progress)
44
-------------------------
5+
- Add :ref:`GroupLasso Estimator <skglm.GroupLasso>` (PR: :gh:`228`)
6+
- Add support and tutorial for positive coefficients to :ref:`Group Lasso Penalty <skglm.penalties.WeightedGroupL2>` (PR: :gh:`221`)
57

68

79
Version 0.3.1 (2023/12/21)
@@ -11,4 +13,3 @@ Version 0.3.1 (2023/12/21)
1113
- Add :ref:`LogSumPenalty <skglm.penalties.LogSumPenalty>` (PR: :gh:`#127`)
1214
- Remove abstract methods in ``BaseDatafit`` and ``BasePenalty`` to make solver/penalty/datafit compatibility check easier (PR :gh:`#205`)
1315
- Add fixed-point distance to build working sets in :ref:`ProxNewton <skglm.solvers.ProxNewton>` solver (:gh:`138`)
14-
- Add support and tutorial for positive coefficients to :ref:`Group Lasso Penalty <skglm.penalties.block_separable.WeightedGroupL2>` (PR: :gh:`221`)

doc/tutorials/prox_nn_group_lasso.rst

Lines changed: 44 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -136,14 +136,15 @@ and thus, combined with Equations :eq:`prox_projection_nn_Sc` and :eq:`prox_proj
136136
137137
138138
.. _subdiff_positive_group_lasso:
139+
139140
Subdifferential of the positive Group Lasso penalty
140141
===================================================
141142

142143
For the ``subdiff_diff`` working set strategy, we compute the distance :math:`D(v)` for some :math:`v` to the subdifferential of the :math:`h` penalty at a point :math:`w`.
143-
Since the penalty is group-separable, we consider a block of variables, in :math:`\mathbb{R}^g`.
144+
Since the penalty is group-separable, we reduce the case where :math:`w` is a block of variables in :math:`\mathbb{R}^g`.
144145

145-
Case :math:`w` has a strictly negative coordinate
146-
-------------------------------------------------
146+
Case :math:`w \notin \mathbb{R}_+^g`
147+
------------------------------------
147148

148149
If any component of :math:`w` is strictly negative, the subdifferential is empty, and the distance is :math:`+ \infty`.
149150

@@ -152,17 +153,6 @@ If any component of :math:`w` is strictly negative, the subdifferential is empty
152153
D(v) = + \infty, \quad \forall v \in \mathbb{R}^g
153154
.
154155
155-
156-
Case :math:`w` is strictly positive
157-
-----------------------------------
158-
159-
At a non zero point with strictly positive entries, the penalty is differentiable hence its subgradient is the singleton :math:`w / {|| w ||}`.
160-
161-
.. math::
162-
163-
D(v) = || v - w / {|| w ||} ||, \quad \forall v \in \mathbb{R}^g
164-
.
165-
166156
Case :math:`w = 0`
167157
------------------
168158

@@ -189,10 +179,48 @@ Minimizing over :math:`n` then over :math:`u`, thanks to [`1 <https://math.stack
189179
D(v) = \max(0, ||v^+|| - \lambda)
190180
,
191181
192-
Where :math:`v^+` is :math:`v` restricted to its positive coordinates.
182+
where :math:`v^+` is :math:`v` restricted to its positive coordinates.
183+
Intuitively, it is clear that if :math:`v_i < 0`, we can cancel it exactly in the objective function by taking :math:`n_i = - v_i` and :math:`u_i = 0`; on the other hand, if :math:`v_i>0`, taking a non zero :math:`n_i` will only increase the quantity that :math:`u_i` needs to bring closer to 0.
184+
185+
For a rigorous derivation of this, introduce the Lagrangian on a squared objective
186+
187+
.. math::
188+
189+
\mathcal{L}(u, n, \nu, \mu) =
190+
\frac{1}{2}\norm{u + n - v}^2 + \nu(\frac{1}{2} \norm{u}^2 - \lambda^2 / 2) + \langle \mu, n \rangle
191+
,
192+
193+
and write down the optimality condition with respect to :math:`u` and :math:`n`.
194+
Treat the case :math:`nu = 0` separately; in the other case show that :\math:`u` must be positive, and that :math:`v = (1 + \nu) u + n`, together with :math:`u = \mu / \nu` and complementary slackness, to reach the conclusion.
195+
196+
Case :math:`|| w || \ne 0`
197+
---------------------------
198+
The subdifferential in that case is :math:`\lambda w / {|| w ||} + C_1 \times \ldots \times C_g` where :math:`C_j = {0}` if :math:`w_j > 0` and :math:`C_j = mathbb{R}_-` otherwise (:math:`w_j =0`).
199+
200+
By letting :math:`p` denotes the projection of :math:`v` onto this set,
201+
one has
202+
203+
.. math::
204+
205+
p_j = \lambda \frac{w_j}{||w||} \text{ if } w_j > 0
206+
207+
and
208+
209+
.. math::
210+
211+
p_j = \min(v_j, 0) \text{ otherwise}.
212+
213+
The distance to the subdifferential is then:
214+
215+
.. math::
216+
217+
D(v) = || v - p || = \sqrt{\sum_{j, w_j > 0} (v_j - \lambda \frac{w_j}{||w||})^2 + \sum_{j, w_j=0} \max(0, v_j)^2
218+
219+
since :math:`v_j - \min(v_j, 0) = v_j + \max(-v_j, 0) = \max(0, v_j)`.
220+
193221

194222

195223
References
196224
==========
197225

198-
[1] `<https://math.stackexchange.com/a/2887332/167258>`_
226+
[1] `<https://math.stackexchange.com/a/2887332/167258>`_

skglm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22

33
from skglm.estimators import ( # noqa F401
44
Lasso, WeightedLasso, ElasticNet, MCPRegression, MultiTaskLasso, LinearSVC,
5-
SparseLogisticRegression, GeneralizedLinearEstimator, CoxEstimator
5+
SparseLogisticRegression, GeneralizedLinearEstimator, CoxEstimator, GroupLasso,
66
)

skglm/datafits/group.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
from skglm.datafits.base import BaseDatafit
66
from skglm.datafits.single_task import Logistic
7+
from skglm.utils.sparse_ops import spectral_norm, sparse_columns_slice
78

89

910
class QuadraticGroup(BaseDatafit):
@@ -50,6 +51,20 @@ def get_lipschitz(self, X, y):
5051

5152
return lipschitz
5253

54+
def get_lipschitz_sparse(self, X_data, X_indptr, X_indices, y):
55+
grp_ptr, grp_indices = self.grp_ptr, self.grp_indices
56+
n_groups = len(grp_ptr) - 1
57+
58+
lipschitz = np.zeros(n_groups, dtype=X_data.dtype)
59+
for g in range(n_groups):
60+
grp_g_indices = grp_indices[grp_ptr[g]: grp_ptr[g+1]]
61+
X_data_g, X_indptr_g, X_indices_g = sparse_columns_slice(
62+
grp_g_indices, X_data, X_indptr, X_indices)
63+
lipschitz[g] = spectral_norm(
64+
X_data_g, X_indptr_g, X_indices_g, len(y)) ** 2 / len(y)
65+
66+
return lipschitz
67+
5368
def value(self, y, w, Xw):
5469
return norm(y - Xw) ** 2 / (2 * len(y))
5570

@@ -63,6 +78,24 @@ def gradient_g(self, X, y, w, Xw, g):
6378

6479
return grad_g
6580

81+
def gradient_g_sparse(self, X_data, X_indptr, X_indices, y, w, Xw, g):
82+
grp_ptr, grp_indices = self.grp_ptr, self.grp_indices
83+
grp_g_indices = grp_indices[grp_ptr[g]: grp_ptr[g+1]]
84+
85+
grad_g = np.zeros(len(grp_g_indices))
86+
for idx, j in enumerate(grp_g_indices):
87+
grad_g[idx] = self.gradient_scalar_sparse(
88+
X_data, X_indptr, X_indices, y, w, Xw, j)
89+
90+
return grad_g
91+
92+
def gradient_scalar_sparse(self, X_data, X_indptr, X_indices, y, w, Xw, j):
93+
grad_j = 0.
94+
for i in range(X_indptr[j], X_indptr[j+1]):
95+
grad_j += X_data[i] * (Xw[X_indices[i]] - y[X_indices[i]])
96+
97+
return grad_j / len(y)
98+
6699
def gradient_scalar(self, X, y, w, Xw, j):
67100
return X[:, j] @ (Xw - y) / len(y)
68101

skglm/estimators.py

Lines changed: 134 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,12 @@
1919
from sklearn.multiclass import OneVsRestClassifier, check_classification_targets
2020

2121
from skglm.utils.jit_compilation import compiled_clone
22-
from skglm.solvers import AndersonCD, MultiTaskBCD
23-
from skglm.datafits import Cox, Quadratic, Logistic, QuadraticSVC, QuadraticMultiTask
24-
from skglm.penalties import (L1, WeightedL1, L1_plus_L2, L2,
22+
from skglm.solvers import AndersonCD, MultiTaskBCD, GroupBCD
23+
from skglm.datafits import (Cox, Quadratic, Logistic, QuadraticSVC,
24+
QuadraticMultiTask, QuadraticGroup)
25+
from skglm.penalties import (L1, WeightedL1, L1_plus_L2, L2, WeightedGroupL2,
2526
MCPenalty, WeightedMCPenalty, IndicatorBox, L2_1)
27+
from skglm.utils.data import grp_converter
2628

2729

2830
def _glm_fit(X, y, model, datafit, penalty, solver):
@@ -1537,3 +1539,132 @@ def path(self, X, Y, alphas, coef_init=None, return_n_iter=False, **params):
15371539
ws_strategy=self.ws_strategy, fit_intercept=self.fit_intercept,
15381540
warm_start=self.warm_start, verbose=self.verbose)
15391541
return solver.path(X, Y, datafit, penalty, alphas, coef_init, return_n_iter)
1542+
1543+
1544+
class GroupLasso(LinearModel, RegressorMixin):
1545+
r"""GroupLasso estimator based on Celer solver and primal extrapolation.
1546+
1547+
The optimization objective for GroupLasso is:
1548+
1549+
.. math::
1550+
1 / (2 xx n_"samples") \sum_g ||y - X_{[g]} w_{[g]}||_2 ^ 2 + alpha \sum_g
1551+
weights_g ||w_{[g]}||_2
1552+
1553+
with :math:`w_{[g]}` (respectively :math:`X_{[g]}`) being the coefficients
1554+
(respectively the columns) of the g-th group.
1555+
1556+
Parameters
1557+
----------
1558+
groups : int | list of ints | list of lists of ints
1559+
Partition of features used in the penalty on ``w``.
1560+
If an int is passed, groups are contiguous blocks of features, of size
1561+
``groups``.
1562+
If a list of ints is passed, groups are assumed to be contiguous,
1563+
group number ``g`` being of size ``groups[g]``.
1564+
If a list of lists of ints is passed, ``groups[g]`` contains the
1565+
feature indices of the group number ``g``.
1566+
1567+
alpha : float, optional
1568+
Penalty strength.
1569+
1570+
weights : array, shape (n_groups,), optional (default=None)
1571+
Positive weights used in the L1 penalty part of the Lasso
1572+
objective. If ``None``, weights equal to 1 are used.
1573+
1574+
max_iter : int, optional (default=50)
1575+
The maximum number of iterations (subproblem definitions).
1576+
1577+
max_epochs : int, optional (default=50_000)
1578+
Maximum number of CD epochs on each subproblem.
1579+
1580+
p0 : int, optional (default=10)
1581+
First working set size.
1582+
1583+
verbose : bool or int, optional (default=0)
1584+
Amount of verbosity.
1585+
1586+
tol : float, optional (default=1e-4)
1587+
Stopping criterion for the optimization.
1588+
1589+
positive : bool, optional (defautl=False)
1590+
When set to ``True``, forces the coefficient vector to be positive.
1591+
1592+
fit_intercept : bool, optional (default=True)
1593+
Whether or not to fit an intercept.
1594+
1595+
warm_start : bool, optional (default=False)
1596+
When set to ``True``, reuse the solution of the previous call to fit as
1597+
initialization, otherwise, just erase the previous solution.
1598+
1599+
ws_strategy : str, optional (default="subdiff")
1600+
The score used to build the working set. Can be ``fixpoint`` or ``subdiff``.
1601+
1602+
Attributes
1603+
----------
1604+
coef_ : array, shape (n_features,)
1605+
parameter vector (:math:`w` in the cost function formula)
1606+
1607+
intercept_ : float
1608+
constant term in decision function.
1609+
1610+
n_iter_ : int
1611+
Number of subproblems solved to reach the specified tolerance.
1612+
1613+
Notes
1614+
-----
1615+
Supports weights equal to ``0``, i.e. unpenalized features.
1616+
"""
1617+
1618+
def __init__(self, groups, alpha=1., weights=None, max_iter=50, max_epochs=50_000,
1619+
p0=10, verbose=0, tol=1e-4, positive=False, fit_intercept=True,
1620+
warm_start=False, ws_strategy="subdiff"):
1621+
super().__init__()
1622+
self.alpha = alpha
1623+
self.groups = groups
1624+
self.weights = weights
1625+
self.tol = tol
1626+
self.max_iter = max_iter
1627+
self.max_epochs = max_epochs
1628+
self.p0 = p0
1629+
self.ws_strategy = ws_strategy
1630+
self.positive = positive
1631+
self.fit_intercept = fit_intercept
1632+
self.warm_start = warm_start
1633+
self.verbose = verbose
1634+
1635+
def fit(self, X, y):
1636+
"""Fit the model according to the given training data.
1637+
1638+
Parameters
1639+
----------
1640+
X : array-like, shape (n_samples, n_features)
1641+
Training data, where ``n_samples`` is the number of samples and
1642+
n_features is the number of features.
1643+
y : array-like, shape (n_samples,)
1644+
Target vector relative to ``X``.
1645+
1646+
Returns
1647+
-------
1648+
self : Instance of GroupLasso
1649+
Fitted estimator.
1650+
"""
1651+
grp_indices, grp_ptr = grp_converter(self.groups, X.shape[1])
1652+
group_sizes = np.diff(grp_ptr)
1653+
1654+
n_features = np.sum(group_sizes)
1655+
if X.shape[1] != n_features:
1656+
raise ValueError(
1657+
"The total number of group members must equal the number of features. "
1658+
f"Got {n_features}, expected {X.shape[1]}.")
1659+
1660+
weights = np.ones(len(group_sizes)) if self.weights is None else self.weights
1661+
group_penalty = WeightedGroupL2(alpha=self.alpha, grp_ptr=grp_ptr,
1662+
grp_indices=grp_indices, weights=weights,
1663+
positive=self.positive)
1664+
quad_group = QuadraticGroup(grp_ptr=grp_ptr, grp_indices=grp_indices)
1665+
solver = GroupBCD(
1666+
self.max_iter, self.max_epochs, self.p0, tol=self.tol,
1667+
fit_intercept=self.fit_intercept, warm_start=self.warm_start,
1668+
verbose=self.verbose)
1669+
1670+
return _glm_fit(X, y, self, quad_group, group_penalty, solver)

skglm/penalties/block_separable.py

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -335,19 +335,31 @@ def subdiff_distance(self, w, grad_ws, ws):
335335
w_g = w[grp_g_indices]
336336
norm_w_g = norm(w_g)
337337

338-
if self.positive and np.any(w_g < 0):
339-
scores[idx] = np.inf
340-
elif self.positive and norm_w_g == 0:
341-
# distance of -norm(neg_grad_g) to weights[g] * [-alpha, alpha]
342-
neg_grad_g = grad_g[grad_g < 0.]
343-
scores[idx] = max(0, norm(neg_grad_g) - self.alpha * weights[g])
344-
elif (not self.positive) and norm_w_g == 0:
345-
# distance of -norm(grad_g) to weights[g] * [-alpha, alpha]
346-
scores[idx] = max(0, norm(grad_g) - alpha * weights[g])
338+
if self.positive:
339+
if norm_w_g == 0:
340+
# distance of -neg_grad_g to weights[g] * [-alpha, alpha]
341+
neg_grad_g = grad_g[grad_g < 0.]
342+
scores[idx] = max(0,
343+
norm(neg_grad_g) - self.alpha * weights[g])
344+
elif np.any(w_g < 0):
345+
scores[idx] = np.inf
346+
else:
347+
res = np.zeros_like(grad_g)
348+
for j in range(len(w_g)):
349+
thresh = alpha * weights[g] * w_g[j] / norm_w_g
350+
if w_g[j] > 0:
351+
res[j] = -grad_g[j] - thresh
352+
else:
353+
# thresh is 0, we simplify the expression
354+
res[j] = max(-grad_g[j], 0)
355+
scores[idx] = norm(res)
347356
else:
348-
# distance of -grad_g to the subdiff (here a singleton)
349-
subdiff = alpha * weights[g] * w_g / norm_w_g
350-
scores[idx] = norm(grad_g + subdiff)
357+
if norm_w_g == 0:
358+
scores[idx] = max(0, norm(grad_g) - alpha * weights[g])
359+
else:
360+
# distance of -grad_g to the subdiff (here a singleton)
361+
subdiff = alpha * weights[g] * w_g / norm_w_g
362+
scores[idx] = norm(grad_g + subdiff)
351363

352364
return scores
353365

0 commit comments

Comments
 (0)