Skip to content

Commit f244314

Browse files
committed
Merge branch 'dev'
2 parents 76f6e48 + 51d5c34 commit f244314

File tree

4 files changed

+67
-15
lines changed

4 files changed

+67
-15
lines changed

README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,17 @@
66

77
This is a package gathering functionalities to solve a number of generalised linear regression/classification problems which, inherently, correspond to an optimisation problem of the form
88

9-
```
10-
L(y, Xθ) + P(θ)
11-
```
9+
$$
10+
L(y, X\theta) + P(\theta)
11+
$$
12+
13+
where:
14+
15+
- $L$ is a loss function
16+
- $X$ is the $n \times p$ matrix of training observations, where $n$ is the number of _observations_ (sample size) and $p$ is the number of _features_ (dimension)
17+
- $\theta$ the length $p$ vector of weights to be optimized
18+
- $P$ is a penalty function
1219

13-
where `L` is a loss function and `P` is a penalty function (both of those can be scaled or composed).
1420
Additional regression/classification methods which do not directly correspond to this formulation may be added in the future.
1521

1622
The core aims of this package are:

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ where:
1111
* ``y`` is the **target** or **response**, a vector of length ``n`` either of real values (_regression_) or integers (_classification_),
1212
* ``X`` is the **design** or **feature** matrix, a matrix of real values of size ``n \times p`` where ``p`` is the number of _features_ or _dimensions_,\
1313
* ``\theta`` is a vector of ``p`` real valued coefficients to determine,
14-
* ``L`` is a **loss function**, a pre-determined function of ``\mathbb R^n`` to ``\mathbb R^+`` penalising the amplitude of the _residuals_ in a specific way,
14+
* ``L`` is a **loss function**, a pre-determined function of ``\mathbb R^n \times \mathbb R^n`` to ``\mathbb R^+`` penalising the amplitude of the _residuals_ in a specific way,
1515
* ``P`` is a **penalty function**, a pre-determined function of ``\mathbb R^n`` to ``\mathbb R^+`` penalising the amplitude of the _coefficients_ in a specific way.
1616

1717
A well known example is the [Ridge regression](https://en.wikipedia.org/wiki/Tikhonov_regularization) where the objective is to minimise:

src/fit/newton.jl

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ function _fit(glr::GLR{<:Union{LogisticLoss,RobustLoss},<:L2R},
1818
θ₀ = zeros(p)
1919
_fgh! = fgh!(glr, X, y, scratch)
2020
opt = Optim.only_fgh!(_fgh!)
21-
res = Optim.optimize(opt, θ₀, Optim.Newton())
21+
res = Optim.optimize(opt, θ₀, Optim.Newton(; solver.newton_options...),
22+
solver.optim_options)
2223
return Optim.minimizer(res)
2324
end
2425

@@ -42,7 +43,8 @@ function _fit(glr::GLR{<:Union{LogisticLoss,RobustLoss},<:L2R},
4243
_fg! = (g, θ) -> fgh!(glr, X, y, scratch)(0.0, g, nothing, θ) # Optim#738
4344
_Hv! = Hv!(glr, X, y, scratch)
4445
opt = Optim.TwiceDifferentiableHV(_f, _fg!, _Hv!, θ₀)
45-
res = Optim.optimize(opt, θ₀, Optim.KrylovTrustRegion())
46+
res = Optim.optimize(opt, θ₀, Optim.KrylovTrustRegion(; solver.newtoncg_options...),
47+
solver.optim_options)
4648
return Optim.minimizer(res)
4749
end
4850

@@ -62,7 +64,8 @@ function _fit(glr::GLR{<:Union{LogisticLoss,RobustLoss},<:L2R},
6264
θ₀ = zeros(p)
6365
_fg! = (f, g, θ) -> fgh!(glr, X, y, scratch)(f, g, nothing, θ)
6466
opt = Optim.only_fg!(_fg!)
65-
res = Optim.optimize(opt, θ₀, Optim.LBFGS())
67+
res = Optim.optimize(opt, θ₀, Optim.LBFGS(; solver.lbfgs_options...),
68+
solver.optim_options)
6669
return Optim.minimizer(res)
6770
end
6871

@@ -90,7 +93,8 @@ function _fit(glr::GLR{<:MultinomialLoss,<:L2R}, solver::NewtonCG,
9093
_fg! = (g, θ) -> fg!(glr, X, y, scratch)(0.0, g, θ) # XXX: Optim.jl/738
9194
_Hv! = Hv!(glr, X, y, scratch)
9295
opt = Optim.TwiceDifferentiableHV(_f, _fg!, _Hv!, θ₀)
93-
res = Optim.optimize(opt, θ₀, Optim.KrylovTrustRegion())
96+
res = Optim.optimize(opt, θ₀, Optim.KrylovTrustRegion(; solver.newtoncg_options...),
97+
solver.optim_options)
9498
return Optim.minimizer(res)
9599
end
96100

@@ -111,6 +115,7 @@ function _fit(glr::GLR{<:MultinomialLoss,<:L2R}, solver::LBFGS,
111115
θ₀ = zeros(p * c)
112116
_fg! = fg!(glr, X, y, scratch)
113117
opt = Optim.only_fg!(_fg!)
114-
res = Optim.optimize(opt, θ₀, Optim.LBFGS())
118+
res = Optim.optimize(opt, θ₀, Optim.LBFGS(; solver.lbfgs_options...),
119+
solver.optim_options)
115120
return Optim.minimizer(res)
116121
end

src/fit/solvers.jl

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,22 @@ $SIGNATURES
4141
4242
Newton solver. This is a full Hessian solver and should be avoided for
4343
"large scale" cases.
44+
45+
`optim_options` are the [general Optim Options](https://julianlsolvers.github.io/Optim.jl/stable/#user/config/).
46+
`newton_options` are the [options of Newton's method](https://julianlsolvers.github.io/Optim.jl/stable/#algo/newton/)
47+
48+
## Example
49+
```julia
50+
using MLJLinearModels, Optim
51+
52+
solver = MLJLinearModels.Newton(optim_options = Optim.Options(time_limit = 20),
53+
newton_options = (linesearch = Optim.LineSearches.HagerZhang()),))
54+
```
4455
"""
45-
struct Newton <: Solver end
56+
@with_kw struct Newton{O,S} <: Solver
57+
optim_options::O = Optim.Options()
58+
newton_options::S = (; )
59+
end
4660

4761
"""
4862
$SIGNATURES
@@ -51,17 +65,44 @@ Newton CG solver. This is the same as the Newton solver except that instead
5165
of solving systems of the form `H\\b` where `H` is the full Hessian, it uses
5266
a matrix-free conjugate gradient approach to solving that system. This should
5367
generally be preferred for larger scale cases.
68+
69+
`optim_options` are the [general Optim Options](https://julianlsolvers.github.io/Optim.jl/stable/#user/config/).
70+
`newtoncg_options` are the [options of Krylov Trust Region method](https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/multivariate/solvers/second_order/krylov_trust_region.jl)
71+
72+
## Example
73+
```julia
74+
using MLJLinearModels, Optim
75+
76+
solver = MLJLinearModels.Newton(optim_options = Optim.Options(time_limit = 20),
77+
newtoncg_options = (eta = 0.2,))
78+
```
79+
5480
"""
55-
struct NewtonCG <: Solver end
81+
@with_kw struct NewtonCG{O,S} <: Solver
82+
optim_options::O = Optim.Options()
83+
newtoncg_options::S = (; )
84+
end
5685

5786
"""
5887
$SIGNATURES
5988
6089
LBFGS quasi-Newton solver. See [the wikipedia entry](https://en.wikipedia.org/wiki/Limited-memory_BFGS).
61-
"""
62-
struct LBFGS <: Solver end
6390
64-
# struct BFGS <: Solver end
91+
`optim_options` are the [general Optim Options](https://julianlsolvers.github.io/Optim.jl/stable/#user/config/).
92+
`lbfgs_options` are the [options of LBFGS method](https://julianlsolvers.github.io/Optim.jl/stable/#algo/lbfgs/)
93+
94+
## Example
95+
```julia
96+
using MLJLinearModels, Optim
97+
98+
solver = MLJLinearModels.Newton(optim_options = Optim.Options(time_limit = 20),
99+
lbfgs_options = (linesearch = Optim.LineSearches.HagerZhang()),))
100+
```
101+
"""
102+
@with_kw struct LBFGS{O,S} <: Solver
103+
optim_options::O = Optim.Options()
104+
lbfgs_options::S = (; )
105+
end
65106

66107
# ===================== pgrad.jl
67108

0 commit comments

Comments
 (0)