scikit-learn-contrib
diff --git a/‎doc/api.rst
Lines changed: 1 addition & 0 deletions b/‎doc/api.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/changes/0.4.rst
Lines changed: 2 additions & 1 deletion b/‎doc/changes/0.4.rst
Lines changed: 2 additions & 1 deletion
diff --git a/‎doc/tutorials/prox_nn_group_lasso.rst
Lines changed: 44 additions & 16 deletions b/‎doc/tutorials/prox_nn_group_lasso.rst
Lines changed: 44 additions & 16 deletions
diff --git a/‎skglm/__init__.py
Lines changed: 1 addition & 1 deletion b/‎skglm/__init__.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎skglm/datafits/group.py
Lines changed: 33 additions & 0 deletions b/‎skglm/datafits/group.py
Lines changed: 33 additions & 0 deletions
diff --git a/‎skglm/estimators.py
Lines changed: 134 additions & 3 deletions b/‎skglm/estimators.py
Lines changed: 134 additions & 3 deletions
diff --git a/‎skglm/penalties/block_separable.py
Lines changed: 24 additions & 12 deletions b/‎skglm/penalties/block_separable.py
Lines changed: 24 additions & 12 deletions
@@ -17,6 +17,7 @@ Estimators
    GeneralizedLinearEstimator
    CoxEstimator
    ElasticNet
+   GroupLasso
    Lasso
    LinearSVC
    SparseLogisticRegression
 
@@ -2,6 +2,8 @@
 
 Version 0.4 (in progress)
 -------------------------
+- Add :ref:`GroupLasso Estimator <skglm.GroupLasso>` (PR: :gh:`228`)
+- Add support and tutorial for positive coefficients to :ref:`Group Lasso Penalty <skglm.penalties.WeightedGroupL2>` (PR: :gh:`221`)
 
 
 Version 0.3.1 (2023/12/21)
@@ -11,4 +13,3 @@ Version 0.3.1 (2023/12/21)
 - Add :ref:`LogSumPenalty <skglm.penalties.LogSumPenalty>` (PR: :gh:`#127`)
 - Remove abstract methods in ``BaseDatafit`` and ``BasePenalty`` to make solver/penalty/datafit compatibility check easier (PR :gh:`#205`)
 - Add fixed-point distance to build working sets in :ref:`ProxNewton <skglm.solvers.ProxNewton>` solver (:gh:`138`)
-- Add support and tutorial for positive coefficients to :ref:`Group Lasso Penalty <skglm.penalties.block_separable.WeightedGroupL2>` (PR: :gh:`221`)
 
@@ -136,14 +136,15 @@ and thus, combined with Equations :eq:`prox_projection_nn_Sc` and :eq:`prox_proj
 
 
 .. _subdiff_positive_group_lasso:
+
 Subdifferential of the positive Group Lasso penalty
 ===================================================
 
 For the ``subdiff_diff`` working set strategy, we compute the distance :math:`D(v)` for some :math:`v` to the subdifferential of the :math:`h` penalty at a point :math:`w`.
-Since the penalty is group-separable, we consider a block of variables, in :math:`\mathbb{R}^g`.
+Since the penalty is group-separable, we reduce the case where :math:`w` is a block of variables in :math:`\mathbb{R}^g`.
 
-Case :math:`w` has a strictly negative coordinate
--------------------------------------------------
+Case :math:`w \notin \mathbb{R}_+^g`
+------------------------------------
 
 If any component of :math:`w` is strictly negative, the subdifferential is empty, and the distance is :math:`+ \infty`.
 
@@ -152,17 +153,6 @@ If any component of :math:`w` is strictly negative, the subdifferential is empty
     D(v) = + \infty, \quad \forall v \in \mathbb{R}^g
     .
 
-
-Case :math:`w` is strictly positive
------------------------------------
-
-At a non zero point with strictly positive entries, the penalty is differentiable hence its subgradient is the singleton :math:`w / {|| w ||}`.
-
-.. math::
-
-    D(v) = || v - w / {|| w ||} ||, \quad \forall v \in \mathbb{R}^g
-    .
-
 Case :math:`w = 0`
 ------------------
 
@@ -189,10 +179,48 @@ Minimizing over :math:`n` then over :math:`u`, thanks to [`1 <https://math.stack
     D(v) = \max(0, ||v^+|| - \lambda)
     ,
 
-Where :math:`v^+` is :math:`v` restricted to its positive coordinates.
+where :math:`v^+` is :math:`v` restricted to its positive coordinates.
+Intuitively, it is clear that if :math:`v_i < 0`, we can cancel it exactly in the objective function by taking :math:`n_i = - v_i` and :math:`u_i = 0`; on the other hand, if :math:`v_i>0`, taking a non zero :math:`n_i` will only increase the quantity that :math:`u_i` needs to bring closer to 0.
+
+For a rigorous derivation of this, introduce the Lagrangian on a squared objective
+
+.. math::
+
+    \mathcal{L}(u, n, \nu, \mu) =
+    \frac{1}{2}\norm{u + n - v}^2 + \nu(\frac{1}{2} \norm{u}^2 - \lambda^2 / 2) + \langle \mu, n \rangle
+    ,
+
+and write down the optimality condition with respect to :math:`u` and :math:`n`.
+Treat the case :math:`nu = 0` separately; in the other case show that :\math:`u` must be positive, and that :math:`v = (1 + \nu) u + n`, together with :math:`u = \mu / \nu` and complementary slackness, to reach the conclusion.
+
+Case :math:`|| w  || \ne 0`
+---------------------------
+The subdifferential in that case is :math:`\lambda w / {|| w ||} + C_1 \times \ldots \times C_g` where :math:`C_j = {0}` if :math:`w_j > 0` and :math:`C_j = mathbb{R}_-` otherwise (:math:`w_j =0`).
+
+By letting :math:`p` denotes the projection of :math:`v` onto this set,
+one has
+
+.. math::
+
+    p_j = \lambda \frac{w_j}{||w||}  \text{ if }  w_j > 0
+
+and
+
+.. math::
+
+    p_j = \min(v_j, 0)  \text{ otherwise}.
+
+The distance to the subdifferential is then:
+
+.. math::
+
+    D(v) = || v - p || = \sqrt{\sum_{j, w_j > 0} (v_j - \lambda \frac{w_j}{||w||})^2 + \sum_{j, w_j=0} \max(0, v_j)^2
+
+since :math:`v_j - \min(v_j, 0) = v_j + \max(-v_j, 0) = \max(0, v_j)`.
+
 
 
 References
 ==========
 
-[1] `<https://math.stackexchange.com/a/2887332/167258>`_
+[1] `<https://math.stackexchange.com/a/2887332/167258>`_
@@ -2,5 +2,5 @@
 
 from skglm.estimators import (  # noqa F401
     Lasso, WeightedLasso, ElasticNet, MCPRegression, MultiTaskLasso, LinearSVC,
-    SparseLogisticRegression, GeneralizedLinearEstimator, CoxEstimator
+    SparseLogisticRegression, GeneralizedLinearEstimator, CoxEstimator, GroupLasso,
 )
@@ -4,6 +4,7 @@
 
 from skglm.datafits.base import BaseDatafit
 from skglm.datafits.single_task import Logistic
+from skglm.utils.sparse_ops import spectral_norm, sparse_columns_slice
 
 
 class QuadraticGroup(BaseDatafit):
@@ -50,6 +51,20 @@ def get_lipschitz(self, X, y):
 
         return lipschitz
 
+    def get_lipschitz_sparse(self, X_data, X_indptr, X_indices, y):
+        grp_ptr, grp_indices = self.grp_ptr, self.grp_indices
+        n_groups = len(grp_ptr) - 1
+
+        lipschitz = np.zeros(n_groups, dtype=X_data.dtype)
+        for g in range(n_groups):
+            grp_g_indices = grp_indices[grp_ptr[g]: grp_ptr[g+1]]
+            X_data_g, X_indptr_g, X_indices_g = sparse_columns_slice(
+                grp_g_indices, X_data, X_indptr, X_indices)
+            lipschitz[g] = spectral_norm(
+                X_data_g, X_indptr_g, X_indices_g, len(y)) ** 2 / len(y)
+
+        return lipschitz
+
     def value(self, y, w, Xw):
         return norm(y - Xw) ** 2 / (2 * len(y))
 
@@ -63,6 +78,24 @@ def gradient_g(self, X, y, w, Xw, g):
 
         return grad_g
 
+    def gradient_g_sparse(self, X_data, X_indptr, X_indices, y, w, Xw, g):
+        grp_ptr, grp_indices = self.grp_ptr, self.grp_indices
+        grp_g_indices = grp_indices[grp_ptr[g]: grp_ptr[g+1]]
+
+        grad_g = np.zeros(len(grp_g_indices))
+        for idx, j in enumerate(grp_g_indices):
+            grad_g[idx] = self.gradient_scalar_sparse(
+                X_data, X_indptr, X_indices, y, w, Xw, j)
+
+        return grad_g
+
+    def gradient_scalar_sparse(self, X_data, X_indptr, X_indices, y, w, Xw, j):
+        grad_j = 0.
+        for i in range(X_indptr[j], X_indptr[j+1]):
+            grad_j += X_data[i] * (Xw[X_indices[i]] - y[X_indices[i]])
+
+        return grad_j / len(y)
+
     def gradient_scalar(self, X, y, w, Xw, j):
         return X[:, j] @ (Xw - y) / len(y)
 
 
@@ -19,10 +19,12 @@
 from sklearn.multiclass import OneVsRestClassifier, check_classification_targets
 
 from skglm.utils.jit_compilation import compiled_clone
-from skglm.solvers import AndersonCD, MultiTaskBCD
-from skglm.datafits import Cox, Quadratic, Logistic, QuadraticSVC, QuadraticMultiTask
-from skglm.penalties import (L1, WeightedL1, L1_plus_L2, L2,
+from skglm.solvers import AndersonCD, MultiTaskBCD, GroupBCD
+from skglm.datafits import (Cox, Quadratic, Logistic, QuadraticSVC,
+                            QuadraticMultiTask, QuadraticGroup)
+from skglm.penalties import (L1, WeightedL1, L1_plus_L2, L2, WeightedGroupL2,
                              MCPenalty, WeightedMCPenalty, IndicatorBox, L2_1)
+from skglm.utils.data import grp_converter
 
 
 def _glm_fit(X, y, model, datafit, penalty, solver):
@@ -1537,3 +1539,132 @@ def path(self, X, Y, alphas, coef_init=None, return_n_iter=False, **params):
             ws_strategy=self.ws_strategy, fit_intercept=self.fit_intercept,
             warm_start=self.warm_start, verbose=self.verbose)
         return solver.path(X, Y, datafit, penalty, alphas, coef_init, return_n_iter)
+
+
+class GroupLasso(LinearModel, RegressorMixin):
+    r"""GroupLasso estimator based on Celer solver and primal extrapolation.
+
+    The optimization objective for GroupLasso is:
+
+    .. math::
+        1 / (2 xx n_"samples") \sum_g ||y - X_{[g]} w_{[g]}||_2 ^ 2 + alpha \sum_g
+        weights_g ||w_{[g]}||_2
+
+    with :math:`w_{[g]}` (respectively :math:`X_{[g]}`) being the coefficients
+    (respectively the columns) of the g-th group.
+
+    Parameters
+    ----------
+    groups : int | list of ints | list of lists of ints
+        Partition of features used in the penalty on ``w``.
+        If an int is passed, groups are contiguous blocks of features, of size
+        ``groups``.
+        If a list of ints is passed, groups are assumed to be contiguous,
+        group number ``g`` being of size ``groups[g]``.
+        If a list of lists of ints is passed, ``groups[g]`` contains the
+        feature indices of the group number ``g``.
+
+    alpha : float, optional
+        Penalty strength.
+
+    weights : array, shape (n_groups,), optional (default=None)
+        Positive weights used in the L1 penalty part of the Lasso
+        objective. If ``None``, weights equal to 1 are used.
+
+    max_iter : int, optional (default=50)
+        The maximum number of iterations (subproblem definitions).
+
+    max_epochs : int, optional (default=50_000)
+        Maximum number of CD epochs on each subproblem.
+
+    p0 : int, optional (default=10)
+        First working set size.
+
+    verbose : bool or int, optional (default=0)
+        Amount of verbosity.
+
+    tol : float, optional (default=1e-4)
+        Stopping criterion for the optimization.
+
+    positive : bool, optional (defautl=False)
+        When set to ``True``, forces the coefficient vector to be positive.
+
+    fit_intercept : bool, optional (default=True)
+        Whether or not to fit an intercept.
+
+    warm_start : bool, optional (default=False)
+        When set to ``True``, reuse the solution of the previous call to fit as
+        initialization, otherwise, just erase the previous solution.
+
+    ws_strategy : str, optional (default="subdiff")
+        The score used to build the working set. Can be ``fixpoint`` or ``subdiff``.
+
+    Attributes
+    ----------
+    coef_ : array, shape (n_features,)
+        parameter vector (:math:`w` in the cost function formula)
+
+    intercept_ : float
+        constant term in decision function.
+
+    n_iter_ : int
+        Number of subproblems solved to reach the specified tolerance.
+
+    Notes
+    -----
+    Supports weights equal to ``0``, i.e. unpenalized features.
+    """
+
+    def __init__(self, groups, alpha=1., weights=None, max_iter=50, max_epochs=50_000,
+                 p0=10, verbose=0, tol=1e-4, positive=False, fit_intercept=True,
+                 warm_start=False, ws_strategy="subdiff"):
+        super().__init__()
+        self.alpha = alpha
+        self.groups = groups
+        self.weights = weights
+        self.tol = tol
+        self.max_iter = max_iter
+        self.max_epochs = max_epochs
+        self.p0 = p0
+        self.ws_strategy = ws_strategy
+        self.positive = positive
+        self.fit_intercept = fit_intercept
+        self.warm_start = warm_start
+        self.verbose = verbose
+
+    def fit(self, X, y):
+        """Fit the model according to the given training data.
+
+        Parameters
+        ----------
+        X : array-like, shape (n_samples, n_features)
+            Training data, where ``n_samples`` is the number of samples and
+            n_features is the number of features.
+        y : array-like, shape (n_samples,)
+            Target vector relative to ``X``.
+
+        Returns
+        -------
+        self : Instance of GroupLasso
+            Fitted estimator.
+        """
+        grp_indices, grp_ptr = grp_converter(self.groups, X.shape[1])
+        group_sizes = np.diff(grp_ptr)
+
+        n_features = np.sum(group_sizes)
+        if X.shape[1] != n_features:
+            raise ValueError(
+                "The total number of group members must equal the number of features. "
+                f"Got {n_features}, expected {X.shape[1]}.")
+
+        weights = np.ones(len(group_sizes)) if self.weights is None else self.weights
+        group_penalty = WeightedGroupL2(alpha=self.alpha, grp_ptr=grp_ptr,
+                                        grp_indices=grp_indices, weights=weights,
+                                        positive=self.positive)
+        quad_group = QuadraticGroup(grp_ptr=grp_ptr, grp_indices=grp_indices)
+        solver = GroupBCD(
+            self.max_iter, self.max_epochs, self.p0, tol=self.tol,
+            fit_intercept=self.fit_intercept, warm_start=self.warm_start,
+            verbose=self.verbose)
+
+        return _glm_fit(X, y, self, quad_group, group_penalty, solver)
@@ -335,19 +335,31 @@ def subdiff_distance(self, w, grad_ws, ws):
             w_g = w[grp_g_indices]
             norm_w_g = norm(w_g)
 
-            if self.positive and np.any(w_g < 0):
-                scores[idx] = np.inf
-            elif self.positive and norm_w_g == 0:
-                # distance of -norm(neg_grad_g) to weights[g] * [-alpha, alpha]
-                neg_grad_g = grad_g[grad_g < 0.]
-                scores[idx] = max(0, norm(neg_grad_g) - self.alpha * weights[g])
-            elif (not self.positive) and norm_w_g == 0:
-                # distance of -norm(grad_g) to weights[g] * [-alpha, alpha]
-                scores[idx] = max(0, norm(grad_g) - alpha * weights[g])
+            if self.positive:
+                if norm_w_g == 0:
+                    # distance of -neg_grad_g to weights[g] * [-alpha, alpha]
+                    neg_grad_g = grad_g[grad_g < 0.]
+                    scores[idx] = max(0,
+                                      norm(neg_grad_g) - self.alpha * weights[g])
+                elif np.any(w_g < 0):
+                    scores[idx] = np.inf
+                else:
+                    res = np.zeros_like(grad_g)
+                    for j in range(len(w_g)):
+                        thresh = alpha * weights[g] * w_g[j] / norm_w_g
+                        if w_g[j] > 0:
+                            res[j] = -grad_g[j] - thresh
+                        else:
+                            # thresh is 0, we simplify the expression
+                            res[j] = max(-grad_g[j], 0)
+                    scores[idx] = norm(res)
             else:
-                # distance of -grad_g to the subdiff (here a singleton)
-                subdiff = alpha * weights[g] * w_g / norm_w_g
-                scores[idx] = norm(grad_g + subdiff)
+                if norm_w_g == 0:
+                    scores[idx] = max(0, norm(grad_g) - alpha * weights[g])
+                else:
+                    # distance of -grad_g to the subdiff (here a singleton)
+                    subdiff = alpha * weights[g] * w_g / norm_w_g
+                    scores[idx] = norm(grad_g + subdiff)
 
         return scores
Original file line number	Diff line number	Diff line change
`@@ -2,5 +2,5 @@`
`2`	`2`
`3`	`3`	`from skglm.estimators import ( # noqa F401`
`4`	`4`	`Lasso, WeightedLasso, ElasticNet, MCPRegression, MultiTaskLasso, LinearSVC,`
`5`		`- SparseLogisticRegression, GeneralizedLinearEstimator, CoxEstimator`
	`5`	`+ SparseLogisticRegression, GeneralizedLinearEstimator, CoxEstimator, GroupLasso,`
`6`	`6`	`)`