You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) is a Julia [@bezanson-edelman-karpinski-shah-2017]package that implements a family of quadratic regularization and trust-region type algorithms for solving nonsmooth optimization problems of the form:
35
+
[RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) is a Julia package that implements a family of quadratic regularization and trust-region type algorithms for solving nonsmooth optimization problems of the form:
@@ -79,7 +79,7 @@ In contrast to [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/Proxim
79
79
Hessians can be obtained via automatic differentiation through [ADNLPModels.jl](https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl) or supplied directly as Hessian–vector products $v \mapsto Hv$.
80
80
This enables algorithms to exploit second-order information without explicitly forming dense (or sparse) Hessians, which is often prohibitively expensive in both computation and memory, particularly in high-dimensional settings.
81
81
82
-
## Requirements of the RegularizedProblems.jl package
82
+
## Requirements of the RegularizedProblems.jl
83
83
84
84
To model the problem \eqref{eq:nlp}, one defines the smooth part $f$ and the nonsmooth part $h$ as discussed above.
85
85
The package [RegularizedProblems.jl](https://github.com/JuliaSmoothOptimizers/RegularizedProblems.jl) provides a straightforward way to create such instances, called *Regularized Nonlinear Programming Models*:
This design makes it a convenient source of reproducible problem instances for testing and benchmarking algorithms in the repository [@diouane-habiboullah-orban-2024;@aravkin-baraldi-orban-2022;@aravkin-baraldi-orban-2024;@leconte-orban-2023-2].
92
92
93
-
## Requirements of the ShiftedProximalOperators.jl package
93
+
## Requirements of the ShiftedProximalOperators.jl
94
94
95
95
The nonsmooth part $h$ must have a computable proximal mapping, defined as
where $q$ is given, $x$ and $s$ are fixed shifts, $h$ is the nonsmooth term with respect
107
107
to which we are computing the proximal operator, and $χ(.; \Delta B)$ is the indicator of
108
108
a ball of radius $\Delta$ defined by a certain norm.
109
-
110
-
{ width=70% }
111
-
109
+
This package enables to encode this shifted proximal operator through without adding allocations and allowing to solve problem \eqref{eq:nlp} with bound constraints.
112
110
113
111
## Testing and documentation
114
112
@@ -117,9 +115,10 @@ Extensive documentation is provided, including a user guide, API reference, and
117
115
Aqua.jl is used to test the package dependencies.
118
116
Documentation is built using Documenter.jl.
119
117
120
-
## Non-monotone strategies
118
+
## Solvers caracteristics
121
119
122
-
The solvers in [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) implement non-monotone strategies to accept trial points, which can enhance algorithmic performance in practice [@leconte-orban-2023;@diouane-habiboullah-orban-2024].
120
+
All solvers in [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) are implemented in an in-place fashion, minimizing memory allocations during the resolution process.
121
+
Moreover, they implement non-monotone strategies to accept trial points, which can enhance algorithmic performance in practice [@leconte-orban-2023;@diouane-habiboullah-orban-2024].
123
122
124
123
## Application studies
125
124
@@ -134,197 +133,63 @@ This is crucial for large-scale problems where exact subproblem solutions are pr
134
133
Moreover, one way to outperform line-search–based methods is to solve the subproblems more accurately by performing many proximal iterations, which are inexpensive to compute, rather than relying on numerous function and gradient evaluations.
135
134
We will illustrate this in the examples below.
136
135
137
-
## In-place methods
138
-
139
-
All solvers in [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) are implemented in an in-place fashion, minimizing memory allocations during the resolution process.
140
136
141
137
# Examples
142
138
143
139
144
-
We consider three examples where the smooth part $f$ is nonconvex and the nonsmooth part $h$ is either $\ell^{1/2}$ or $\ell_0$ norm with or without constraints.
140
+
We consider two examples where the smooth part $f$ is nonconvex and the nonsmooth part $h$ is either $\ell^{1/2}$ or $\ell_0$ norm with constraints.
145
141
146
142
We compare the performance of our solvers with (**PANOC**) solver [@stella-themelis-sopasakis-patrinos-2017] implemented in [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl).
147
143
148
-
## Problem of support vector machine with $\ell^{1/2}$ penalty
144
+
We illustrate the capabilities of [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) on two nonsmooth and nonconvex problems:
149
145
146
+
-**Support Vector Machine (SVM) with $\ell^{1/2}$ penalty** for image classification [@aravkin-baraldi-orban-2024].
147
+
-**Nonnegative Matrix Factorization (NNMF) with $\ell_0$ penalty and constraints**[@kim-park-2008].
150
148
151
-
A first example addresses an image recognition task using a support vector machine (SVM) similar to those in [@aravkin-baraldi-orban-2024].
152
-
The formulation is
153
-
$$
154
-
\min_{x \in \mathbb{R}^n} \ \tfrac{1}{2} \|\mathbf{1} - \tanh(b \odot \langle A, x \rangle)\|^2 + \|x\|_{1/2}^{1/2},
155
-
$$
156
-
where $A \in \mathbb{R}^{m \times n}$, with $n = 784$ representing the vectorized size of each image and $m = 13{,}007$ is the number of images in the training dataset.
149
+
Both problems are of the form $\min f(x) + h(x)$ with $f$ nonconvex and $h$ nonsmooth.
150
+
The NNMF problem can be set up in a similar way to the SVM case, with $h$ given by an $\ell_0$ norm and additional nonnegativity constraints.
151
+
Below is a condensed example showing how to define and solve such problems:
157
152
158
153
```julia
159
-
using LinearAlgebra, Random
160
-
using ProximalOperators
161
-
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization
We observe that both **TR** and **R2N** outperform **PANOC** in terms of the number of function and gradient evaluations and computational time, although they require more proximal iterations.
208
-
But since each proximal iteration is inexpensive, the overall performance is better. In this instance, PANOC exhibits markedly slower convergence.
209
-
210
-
## Problem of FitzHugh-Nagumo inverse with $\ell_0$ penalty
211
-
212
-
A second example is the FitzHugh-Nagumo inverse problem with an $\ell_0$ penalty, as described in [@aravkin-baraldi-orban-2022] and [@aravkin-baraldi-orban-2024].
213
-
This problem consists of recovering the parameters of a system of ordinary differential equations (ODEs) with sparsity constraints.
214
-
In general, the evaluation of the objective function and its gradient are costly because they require solving the ODEs compared to the proximal operator of the $\ell_0$ norm, which is inexpensive.
215
-
216
-
```julia
217
-
using LinearAlgebra
218
-
using ProximalOperators
219
-
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization
220
-
using DifferentialEquations, ADNLPModels
221
-
222
-
# Define the Fitzhugh-Nagumo problem
223
-
model, _, _ = RegularizedProblems.fh_model()
224
-
x0 =0.1*ones(model.meta.nvars) # initial guess
225
-
226
-
# Define the Hessian approximation
227
-
f =LBFGSModel(fh_model)
228
-
229
-
# Initialize the starting Hessian approximation scaling factor
230
-
f.op.data.scaling_factor =1e4
169
+
### Numerical results
231
170
232
-
# Define the nonsmooth regularizer (L1 norm)
233
-
λ =1.0
234
-
h =NormL0(λ)
171
+
We compare **PANOC** (from [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl)) with **TR**, **R2N**, and **LM** from our library.
172
+
The results are summarized in the combined table below:
235
173
236
-
# Define the regularized NLP model
237
-
reg_nlp =RegularizedNLPModel(f, h)
238
-
239
-
# Choose a solver (TR) and execution statistics tracker
240
-
solver_tr =TRSolver(reg_nlp)
241
-
stats =RegularizedExecutionStats(reg_nlp)
242
-
243
-
# Max number of proximal iterations for subproblem solver
Same observation as in the previous example: **TR** and **R2N** with LBFGS approximation of the Hessian of $f$ outperform **PANOC** in terms of the number of function and gradient evaluations and computational time, although they require more proximal iterations.
262
-
263
-
## Problem of Nonnegative least squares with $\ell_0$ penalty and constraints
264
-
265
-
The third experiment considers the sparse nonnegative matrix factorization (NNMF) problem introduced by [@kim-park-2008].
266
-
Let $A \in \mathbb{R}^{m \times n}$ be a nonnegative matrix whose columns correspond to observations drawn from a Gaussian mixture, with negative entries truncated to zero.
267
-
268
-
The goal is to obtain a factorization $A \approx WH$, where $W \in \mathbb{R}^{m \times k}$, $H \in \mathbb{R}^{k \times n}$, $k < \min(m,n)$, such that both factors are nonnegative and $H$ is sparse.
269
-
270
-
This leads to the optimization problem
271
-
272
-
$$
273
-
\min_{W, H \geq 0} \; \tfrac{1}{2} \| A - WH \|_F^2 + \lambda \| \operatorname{vec}(H) \|_0,
274
-
$$
275
-
276
-
where $\operatorname{vec}(H)$ denotes the column-stacked version of $H$.
277
-
278
-
Compared to the previous examples, we now consider a constrained problem with a nonsmooth and nonconvex term.
279
-
280
-
The library [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) provides solvers that can handle constraints by separating the objective into three parts: a smooth term, a nonsmooth term, and the indicator function of the constraints. However, this approach assumes that the nonsmooth part is convex, which is not the case here.
281
-
282
-
Another approach is to merge the nonsmooth term with the indicator function of the constraints into a single nonsmooth function, and then apply **PANOC**, which is the strategy adopted here. However, the current library of proximal operators, [ProximalOperators.jl](https://github.com/JuliaFirstOrder/ProximalOperators.jl), on which [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) relies, does not provide the proximal mapping of the sum of the $\ell_0$ norm and the indicator function of the nonnegative orthant. In contrast, [ShiftedProximalOperators.jl](https://github.com/JuliaSmoothOptimizers/ShiftedProximalOperators.jl) does implement this operator.
283
-
284
-
Therefore, to apply **PANOC** in this setting, one would first need to implement this combined proximal operator in [ProximalOperators.jl](https://github.com/JuliaFirstOrder/ProximalOperators.jl). For this reason, we do not include **PANOC** in this example.
285
-
286
-
Instead, we compare the performance of **TR** and **R2N** with that of **LM**.
287
-
288
-
```julia
289
-
using LinearAlgebra
290
-
using ProximalOperators
291
-
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization
292
-
using DifferentialEquations, ADNLPModels
293
-
294
-
# Build the models
295
-
m, n, k =100, 50, 5
296
-
model, nls_model, A, selected =nnmf_model(m, n, k)
-**SVM with $\ell^{1/2}$ penalty:** TR and R2N require far fewer function and gradient evaluations than PANOC, at the expense of more proximal iterations. Since each proximal step is inexpensive, TR and R2N are much faster overall.
190
+
-**NNMF with constrained $\ell_0$ penalty:** R2N slightly outperforms TR, while LM is competitive in terms of function calls but incurs many gradient evaluations.
324
191
325
-
We observe that **R2N** and **TR** achieve similar performance, with **R2N** being slightly better.
326
-
Both methods outperform **LM** in terms of computational time and the number of gradient evaluations.
327
-
However, **LM** requires significantly fewer function evaluations, which is expected since it is specifically designed for nonlinear least squares problems and can exploit the structure of the objective function more effectively.
192
+
Additional tests (e.g., other regularizers, constraint types, and scaling dimensions) have also been conducted, and a full benchmarking campaign is currently underway.
0 commit comments