Skip to content

Commit 552c243

Browse files
second final version
1 parent 157b986 commit 552c243

File tree

2 files changed

+45
-192
lines changed

2 files changed

+45
-192
lines changed

paper/paper.bib

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -74,18 +74,6 @@ @TechReport{ diouane-gollier-orban-2024
7474
doi = {10.13140/RG.2.2.16095.47527},
7575
}
7676

77-
@article{bezanson-edelman-karpinski-shah-2017,
78-
author = {Bezanson, Jeff and Edelman, Alan and Karpinski, Stefan and Shah, Viral B.},
79-
title = {Julia: A Fresh Approach to Numerical Computing},
80-
journal = {SIAM Review},
81-
volume = {59},
82-
number = {1},
83-
pages = {65--98},
84-
year = {2017},
85-
doi = {10.1137/141000671},
86-
publisher = {SIAM},
87-
}
88-
8977
@Misc{orban-siqueira-cutest-2020,
9078
author = {D. Orban and A. S. Siqueira and {contributors}},
9179
title = {{CUTEst.jl}: {J}ulia's {CUTEst} interface},

paper/paper.md

Lines changed: 45 additions & 180 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ header-includes: |
3232

3333
# Summary
3434

35-
[RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) is a Julia [@bezanson-edelman-karpinski-shah-2017] package that implements a family of quadratic regularization and trust-region type algorithms for solving nonsmooth optimization problems of the form:
35+
[RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) is a Julia package that implements a family of quadratic regularization and trust-region type algorithms for solving nonsmooth optimization problems of the form:
3636
\begin{equation}\label{eq:nlp}
3737
\underset{x \in \mathbb{R}^n}{\text{minimize}} \quad f(x) + h(x),
3838
\end{equation}
@@ -79,7 +79,7 @@ In contrast to [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/Proxim
7979
Hessians can be obtained via automatic differentiation through [ADNLPModels.jl](https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl) or supplied directly as Hessian–vector products $v \mapsto Hv$.
8080
This enables algorithms to exploit second-order information without explicitly forming dense (or sparse) Hessians, which is often prohibitively expensive in both computation and memory, particularly in high-dimensional settings.
8181

82-
## Requirements of the RegularizedProblems.jl package
82+
## Requirements of the RegularizedProblems.jl
8383

8484
To model the problem \eqref{eq:nlp}, one defines the smooth part $f$ and the nonsmooth part $h$ as discussed above.
8585
The package [RegularizedProblems.jl](https://github.com/JuliaSmoothOptimizers/RegularizedProblems.jl) provides a straightforward way to create such instances, called *Regularized Nonlinear Programming Models*:
@@ -90,7 +90,7 @@ reg_nlp = RegularizedNLPModel(f, h)
9090

9191
This design makes it a convenient source of reproducible problem instances for testing and benchmarking algorithms in the repository [@diouane-habiboullah-orban-2024;@aravkin-baraldi-orban-2022;@aravkin-baraldi-orban-2024;@leconte-orban-2023-2].
9292

93-
## Requirements of the ShiftedProximalOperators.jl package
93+
## Requirements of the ShiftedProximalOperators.jl
9494

9595
The nonsmooth part $h$ must have a computable proximal mapping, defined as
9696
$$\text{prox}_{h}(v) = \underset{x \in \mathbb{R}^n}{\arg\min} \left( h(x) + \frac{1}{2} \|x - v\|^2 \right).$$
@@ -101,14 +101,12 @@ The main difference between the proximal operators implemented in
101101
is that those implemented here involve a translation of the nonsmooth term.
102102
Specifically, this package considers proximal operators defined as
103103
$$
104-
\underset{t \in \mathbb{R}^n}{\arg\min} \, { \tfrac{1}{2} ‖t - q‖₂² + ν h(x + s + t) + χ(s + t; ΔB) | t ∈ ℝⁿ },
104+
\underset{t \in \mathbb{R}^n}{\arg\min} \, { \tfrac{1}{2} ‖t - q‖₂² + ν h(x + s + t) + χ(s + t|ΔB)}
105105
$$
106106
where $q$ is given, $x$ and $s$ are fixed shifts, $h$ is the nonsmooth term with respect
107107
to which we are computing the proximal operator, and $χ(.; \Delta B)$ is the indicator of
108108
a ball of radius $\Delta$ defined by a certain norm.
109-
110-
![Composition of JSO packages](jso-packages.pdf){ width=70% }
111-
109+
This package enables to encode this shifted proximal operator through without adding allocations and allowing to solve problem \eqref{eq:nlp} with bound constraints.
112110

113111
## Testing and documentation
114112

@@ -117,9 +115,10 @@ Extensive documentation is provided, including a user guide, API reference, and
117115
Aqua.jl is used to test the package dependencies.
118116
Documentation is built using Documenter.jl.
119117

120-
## Non-monotone strategies
118+
## Solvers caracteristics
121119

122-
The solvers in [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) implement non-monotone strategies to accept trial points, which can enhance algorithmic performance in practice [@leconte-orban-2023;@diouane-habiboullah-orban-2024].
120+
All solvers in [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) are implemented in an in-place fashion, minimizing memory allocations during the resolution process.
121+
Moreover, they implement non-monotone strategies to accept trial points, which can enhance algorithmic performance in practice [@leconte-orban-2023;@diouane-habiboullah-orban-2024].
123122

124123
## Application studies
125124

@@ -134,197 +133,63 @@ This is crucial for large-scale problems where exact subproblem solutions are pr
134133
Moreover, one way to outperform line-search–based methods is to solve the subproblems more accurately by performing many proximal iterations, which are inexpensive to compute, rather than relying on numerous function and gradient evaluations.
135134
We will illustrate this in the examples below.
136135

137-
## In-place methods
138-
139-
All solvers in [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) are implemented in an in-place fashion, minimizing memory allocations during the resolution process.
140136

141137
# Examples
142138

143139

144-
We consider three examples where the smooth part $f$ is nonconvex and the nonsmooth part $h$ is either $\ell^{1/2}$ or $\ell_0$ norm with or without constraints.
140+
We consider two examples where the smooth part $f$ is nonconvex and the nonsmooth part $h$ is either $\ell^{1/2}$ or $\ell_0$ norm with constraints.
145141

146142
We compare the performance of our solvers with (**PANOC**) solver [@stella-themelis-sopasakis-patrinos-2017] implemented in [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl).
147143

148-
## Problem of support vector machine with $\ell^{1/2}$ penalty
144+
We illustrate the capabilities of [RegularizedOptimization.jl](https://github.com/JuliaSmoothOptimizers/RegularizedOptimization.jl) on two nonsmooth and nonconvex problems:
149145

146+
- **Support Vector Machine (SVM) with $\ell^{1/2}$ penalty** for image classification [@aravkin-baraldi-orban-2024].
147+
- **Nonnegative Matrix Factorization (NNMF) with $\ell_0$ penalty and constraints** [@kim-park-2008].
150148

151-
A first example addresses an image recognition task using a support vector machine (SVM) similar to those in [@aravkin-baraldi-orban-2024].
152-
The formulation is
153-
$$
154-
\min_{x \in \mathbb{R}^n} \ \tfrac{1}{2} \|\mathbf{1} - \tanh(b \odot \langle A, x \rangle)\|^2 + \|x\|_{1/2}^{1/2},
155-
$$
156-
where $A \in \mathbb{R}^{m \times n}$, with $n = 784$ representing the vectorized size of each image and $m = 13{,}007$ is the number of images in the training dataset.
149+
Both problems are of the form $\min f(x) + h(x)$ with $f$ nonconvex and $h$ nonsmooth.
150+
The NNMF problem can be set up in a similar way to the SVM case, with $h$ given by an $\ell_0$ norm and additional nonnegativity constraints.
151+
Below is a condensed example showing how to define and solve such problems:
157152

158153
```julia
159-
using LinearAlgebra, Random
160-
using ProximalOperators
161-
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization
162-
using MLDatasets
163-
164-
random_seed = 1234
165-
Random.seed!(random_seed)
166-
167-
# Build the models
168-
model, nls_train, _ = RegularizedProblems.svm_train_model()
169-
170-
# Define the Hessian approximation
171-
f = LSR1Model(model)
172-
173-
# Define the nonsmooth regularizer (L0 norm)
174-
λ = 1.0
175-
h = RootNormLhalf(λ)
176-
177-
# Define the regularized NLP model
178-
reg_nlp = RegularizedNLPModel(f, h)
179-
180-
# Choose a solver (R2DH) and execution statistics tracker
181-
solver_r2n = R2NSolver(reg_nlp)
182-
stats = RegularizedExecutionStats(reg_nlp)
183-
184-
# Max number of proximal iterations for subproblem solver
185-
sub_kwargs = (max_iter=200,)
186-
187-
# Solve the problem
188-
solve!(solver_r2n, reg_nlp, stats, x = f.meta.x0, atol = 1e-4, rtol = 1e-4, verbose = 0, sub_kwargs = sub_kwargs)
189-
190-
191-
192-
193-
194-
154+
using LinearAlgebra, Random, ProximalOperators
155+
using NLPModels, RegularizedProblems, RegularizedOptimization
156+
157+
Random.seed!(1234)
158+
model, nls, _ = RegularizedProblems.svm_train_model() # Build SVM model
159+
f = LSR1Model(model) # Hessian approximation
160+
h = RootNormLhalf(1.0) # Nonsmooth term
161+
reg_nlp = RegularizedNLPModel(f, h) # Regularized problem
162+
solver = R2NSolver(reg_nlp) # Choose solver
163+
stats = RegularizedExecutionStats(reg_nlp)
164+
solve!(solver, reg_nlp, stats; x=f.meta.x0, atol=1e-4, rtol=1e-4, verbose=0, sub_kwargs=(max_iter=200,))
165+
solve!(solver, reg_nlp, stats; x=f.meta.x0, atol=1e-5, rtol=1e-5, verbose=0, sub_kwargs=(max_iter=200,))
195166
```
167+
The NNMF problem can be set up in a similar way, replacing the model by nnmf_model(...) and $h$ by an $\ell_0$ norm.
196168

197-
````
198-
┌───────────┬─────────────┬──────────┬──────┬──────┬───────┐
199-
│ Method │ Status │ Time (s) │ #f │ #∇f │ #prox │
200-
├───────────┼─────────────┼──────────┼──────┼──────┼───────┤
201-
│ PANOC │ first_order │ 51.1714 │ 3713 │ 3713 │ 2269 │
202-
│ TR(LSR1) │ first_order │ 6.8107 │ 385 │ 333 │ 11113 │
203-
│ R2N(LSR1) │ first_order │ 2.4201 │ 175 │ 95 │ 56971 │
204-
└───────────┴─────────────┴──────────┴──────┴──────┴───────┘
205-
````
206-
207-
We observe that both **TR** and **R2N** outperform **PANOC** in terms of the number of function and gradient evaluations and computational time, although they require more proximal iterations.
208-
But since each proximal iteration is inexpensive, the overall performance is better. In this instance, PANOC exhibits markedly slower convergence.
209-
210-
## Problem of FitzHugh-Nagumo inverse with $\ell_0$ penalty
211-
212-
A second example is the FitzHugh-Nagumo inverse problem with an $\ell_0$ penalty, as described in [@aravkin-baraldi-orban-2022] and [@aravkin-baraldi-orban-2024].
213-
This problem consists of recovering the parameters of a system of ordinary differential equations (ODEs) with sparsity constraints.
214-
In general, the evaluation of the objective function and its gradient are costly because they require solving the ODEs compared to the proximal operator of the $\ell_0$ norm, which is inexpensive.
215-
216-
```julia
217-
using LinearAlgebra
218-
using ProximalOperators
219-
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization
220-
using DifferentialEquations, ADNLPModels
221-
222-
# Define the Fitzhugh-Nagumo problem
223-
model, _, _ = RegularizedProblems.fh_model()
224-
x0 = 0.1 * ones(model.meta.nvars) # initial guess
225-
226-
# Define the Hessian approximation
227-
f = LBFGSModel(fh_model)
228-
229-
# Initialize the starting Hessian approximation scaling factor
230-
f.op.data.scaling_factor = 1e4
169+
### Numerical results
231170

232-
# Define the nonsmooth regularizer (L1 norm)
233-
λ = 1.0
234-
h = NormL0(λ)
171+
We compare **PANOC** (from [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl)) with **TR**, **R2N**, and **LM** from our library.
172+
The results are summarized in the combined table below:
235173

236-
# Define the regularized NLP model
237-
reg_nlp = RegularizedNLPModel(f, h)
238-
239-
# Choose a solver (TR) and execution statistics tracker
240-
solver_tr = TRSolver(reg_nlp)
241-
stats = RegularizedExecutionStats(reg_nlp)
242-
243-
# Max number of proximal iterations for subproblem solver
244-
sub_kwargs = (max_iter=200,)
245-
246-
# Solve the problem
247-
solve!(solver_tr, reg_nlp, stats, x = f.meta.x0, atol = 1e-3, rtol = 1e-3, verbose = 0, sub_kwargs = sub_kwargs)
248174
```
249-
250-
````
251-
┌────────────────────────┬─────────────┬──────────┬─────┬─────┬───────┐
252-
│ Method │ Status │ Time (s) │ #f │ #∇f │ #prox │
253-
├────────────────────────┼─────────────┼──────────┼─────┼─────┼───────┤
254-
│ PANOC │ first_order │ 2.0095 │ 188 │ 188 │ 107 │
255-
│ TR(LBFGS) │ first_order │ 0.4377 │ 75 │ 63 │ 21915 │
256-
│ R2N(LBFGS) Nonmonotone │ first_order │ 0.491 │ 99 │ 54 │ 28173 │
257-
└────────────────────────┴─────────────┴──────────┴─────┴─────┴───────┘
258-
259-
````
260-
261-
Same observation as in the previous example: **TR** and **R2N** with LBFGS approximation of the Hessian of $f$ outperform **PANOC** in terms of the number of function and gradient evaluations and computational time, although they require more proximal iterations.
262-
263-
## Problem of Nonnegative least squares with $\ell_0$ penalty and constraints
264-
265-
The third experiment considers the sparse nonnegative matrix factorization (NNMF) problem introduced by [@kim-park-2008].
266-
Let $A \in \mathbb{R}^{m \times n}$ be a nonnegative matrix whose columns correspond to observations drawn from a Gaussian mixture, with negative entries truncated to zero.
267-
268-
The goal is to obtain a factorization $A \approx WH$, where $W \in \mathbb{R}^{m \times k}$, $H \in \mathbb{R}^{k \times n}$, $k < \min(m,n)$, such that both factors are nonnegative and $H$ is sparse.
269-
270-
This leads to the optimization problem
271-
272-
$$
273-
\min_{W, H \geq 0} \; \tfrac{1}{2} \| A - WH \|_F^2 + \lambda \| \operatorname{vec}(H) \|_0,
274-
$$
275-
276-
where $\operatorname{vec}(H)$ denotes the column-stacked version of $H$.
277-
278-
Compared to the previous examples, we now consider a constrained problem with a nonsmooth and nonconvex term.
279-
280-
The library [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) provides solvers that can handle constraints by separating the objective into three parts: a smooth term, a nonsmooth term, and the indicator function of the constraints. However, this approach assumes that the nonsmooth part is convex, which is not the case here.
281-
282-
Another approach is to merge the nonsmooth term with the indicator function of the constraints into a single nonsmooth function, and then apply **PANOC**, which is the strategy adopted here. However, the current library of proximal operators, [ProximalOperators.jl](https://github.com/JuliaFirstOrder/ProximalOperators.jl), on which [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) relies, does not provide the proximal mapping of the sum of the $\ell_0$ norm and the indicator function of the nonnegative orthant. In contrast, [ShiftedProximalOperators.jl](https://github.com/JuliaSmoothOptimizers/ShiftedProximalOperators.jl) does implement this operator.
283-
284-
Therefore, to apply **PANOC** in this setting, one would first need to implement this combined proximal operator in [ProximalOperators.jl](https://github.com/JuliaFirstOrder/ProximalOperators.jl). For this reason, we do not include **PANOC** in this example.
285-
286-
Instead, we compare the performance of **TR** and **R2N** with that of **LM**.
287-
288-
```julia
289-
using LinearAlgebra
290-
using ProximalOperators
291-
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization
292-
using DifferentialEquations, ADNLPModels
293-
294-
# Build the models
295-
m, n, k = 100, 50, 5
296-
model, nls_model, A, selected = nnmf_model(m, n, k)
297-
298-
# Define the nonsmooth regularizer (L1 norm)
299-
λ = norm(grad(model, rand(model.meta.nvar)), Inf) / 200
300-
h = NormL0(λ)
301-
302-
# Define the regularized NLS model
303-
reg_nlp = RegularizedNLSModel(nls_model, h)
304-
305-
# Choose a solver (TR) and execution statistics tracker
306-
solver_lm = LMSolver(reg_nlp)
307-
stats = RegularizedExecutionStats(reg_nlp)
308-
309-
310-
# Solve the problem
311-
solve!(solver_lm, reg_nlp, stats, x = f.meta.x0, atol = 1e-4, rtol = 1e-4, verbose = 0)
175+
┌────────────────────────┬─────────────┬──────────┬──────┬──────┬───────┐
176+
│ Method │ Status │ Time (s) │ #f │ #∇f │ #prox │
177+
├────────────────────────┼─────────────┼──────────┼──────┼──────┼───────┤
178+
│ PANOC (SVM) │ first_order │ 51.17 │ 3713 │ 3713 │ 2269 │
179+
│ TR(LSR1, SVM) │ first_order │ 6.81 │ 385 │ 333 │ 11113 │
180+
│ R2N(LSR1, SVM) │ first_order │ 2.42 │ 175 │ 95 │ 56971 │
181+
│ TR(LBFGS, NNMF) │ first_order │ 1.05 │ 73 │ 68 │ 10005 │
182+
│ R2N(LBFGS, NNMF) │ first_order │ 0.73 │ 68 │ 68 │ 7825 │
183+
│ LM (NNMF) │ first_order │ 1.27 │ 11 │ 2035 │ 481 │
184+
└────────────────────────┴─────────────┴──────────┴──────┴──────┴───────┘
312185
```
313186

314-
```
315-
┌────────────────────────┬─────────────┬──────────┬────┬──────┬───────┐
316-
│ Method │ Status │ Time (s) │ #f │ #∇f │ #prox │
317-
├────────────────────────┼─────────────┼──────────┼────┼──────┼───────┤
318-
│ TR(LBFGS) │ first_order │ 1.0527 │ 73 │ 68 │ 10005 │
319-
│ R2N(LBFGS) Nonmonotone │ first_order │ 0.7296 │ 68 │ 68 │ 7825 │
320-
│ LM │ first_order │ 1.2668 │ 11 │ 2035 │ 481 │
321-
└────────────────────────┴─────────────┴──────────┴────┴──────┴───────┘
187+
### Discussion
322188

323-
```
189+
- **SVM with $\ell^{1/2}$ penalty:** TR and R2N require far fewer function and gradient evaluations than PANOC, at the expense of more proximal iterations. Since each proximal step is inexpensive, TR and R2N are much faster overall.
190+
- **NNMF with constrained $\ell_0$ penalty:** R2N slightly outperforms TR, while LM is competitive in terms of function calls but incurs many gradient evaluations.
324191

325-
We observe that **R2N** and **TR** achieve similar performance, with **R2N** being slightly better.
326-
Both methods outperform **LM** in terms of computational time and the number of gradient evaluations.
327-
However, **LM** requires significantly fewer function evaluations, which is expected since it is specifically designed for nonlinear least squares problems and can exploit the structure of the objective function more effectively.
192+
Additional tests (e.g., other regularizers, constraint types, and scaling dimensions) have also been conducted, and a full benchmarking campaign is currently underway.
328193

329194
## Conclusion
330195

0 commit comments

Comments
 (0)