You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition to the optimisation algorithms provided by the Optimisers.jl package this subpackage
13
13
also provides the Sophia optimisation algorithm.
14
14
15
-
## Local Unconstrained Optimizers
16
-
17
-
-`Sophia`: Based on the recent paper https://arxiv.org/abs/2305.14342. It incorporates second order information
18
-
in the form of the diagonal of the Hessian matrix hence avoiding the need to compute the complete hessian. It has been shown to converge faster than other first order methods such as Adam and SGD.
19
-
20
-
+ `solve(problem, Sophia(; η, βs, ϵ, λ, k, ρ))`
21
-
22
-
+ `η` is the learning rate
23
-
+ `βs` are the decay of momentums
24
-
+ `ϵ` is the epsilon value
25
-
+ `λ` is the weight decay parameter
26
-
+ `k` is the number of iterations to re-compute the diagonal of the Hessian matrix
27
-
+ `ρ` is the momentum
28
-
+ Defaults:
29
-
30
-
* `η = 0.001`
31
-
* `βs = (0.9, 0.999)`
32
-
* `ϵ = 1e-8`
33
-
* `λ = 0.1`
34
-
* `k = 10`
35
-
* `ρ = 0.04`
15
+
## List of optimizers
36
16
37
17
-[`Optimisers.Descent`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Descent): **Classic gradient descent optimizer with learning rate**
38
18
@@ -42,6 +22,7 @@ also provides the Sophia optimisation algorithm.
42
22
+ Defaults:
43
23
44
24
* `η = 0.1`
25
+
45
26
-[`Optimisers.Momentum`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Momentum): **Classic gradient descent optimizer with learning rate and momentum**
Copy file name to clipboardExpand all lines: docs/src/optimization_packages/optimization.md
+54-1Lines changed: 54 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,35 @@ There are some solvers that are available in the Optimization.jl package directl
4
4
5
5
## Methods
6
6
7
-
`LBFGS`: The popular quasi-Newton method that leverages limited memory BFGS approximation of the inverse of the Hessian. Through a wrapper over the [L-BFGS-B](https://users.iems.northwestern.edu/%7Enocedal/lbfgsb.html) fortran routine accessed from the [LBFGSB.jl](https://github.com/Gnimuc/LBFGSB.jl/) package. It directly supports box-constraints.
7
+
-`LBFGS`: The popular quasi-Newton method that leverages limited memory BFGS approximation of the inverse of the Hessian. Through a wrapper over the [L-BFGS-B](https://users.iems.northwestern.edu/%7Enocedal/lbfgsb.html) fortran routine accessed from the [LBFGSB.jl](https://github.com/Gnimuc/LBFGSB.jl/) package. It directly supports box-constraints.
8
8
9
9
This can also handle arbitrary non-linear constraints through a Augmented Lagrangian method with bounds constraints described in 17.4 of Numerical Optimization by Nocedal and Wright. Thus serving as a general-purpose nonlinear optimization solver available directly in Optimization.jl.
10
10
11
+
-`Sophia`: Based on the recent paper https://arxiv.org/abs/2305.14342. It incorporates second order information in the form of the diagonal of the Hessian matrix hence avoiding the need to compute the complete hessian. It has been shown to converge faster than other first order methods such as Adam and SGD.
12
+
13
+
+ `solve(problem, Sophia(; η, βs, ϵ, λ, k, ρ))`
14
+
15
+
+ `η` is the learning rate
16
+
+ `βs` are the decay of momentums
17
+
+ `ϵ` is the epsilon value
18
+
+ `λ` is the weight decay parameter
19
+
+ `k` is the number of iterations to re-compute the diagonal of the Hessian matrix
Copy file name to clipboardExpand all lines: docs/src/tutorials/minibatch.md
+14-11Lines changed: 14 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,15 @@
1
1
# Data Iterators and Minibatching
2
2
3
-
It is possible to solve an optimization problem with batches using a `Flux.Data.DataLoader`, which is passed to `Optimization.solve` with `ncycles`. All data for the batches need to be passed as a tuple of vectors.
3
+
It is possible to solve an optimization problem with batches using a `MLUtils.DataLoader`, which is passed to `Optimization.solve` with `ncycles`. All data for the batches need to be passed as a tuple of vectors.
4
4
5
5
!!! note
6
6
7
7
This example uses the OptimizationOptimisers.jl package. See the
8
8
[Optimisers.jl page](@ref optimisers) for details on the installation and usage.
9
9
10
-
```@example
11
-
using Flux, Optimization, OptimizationOptimisers, OrdinaryDiffEq, SciMLSensitivity
10
+
```@example minibatch
11
+
12
+
using Lux, Optimization, OptimizationOptimisers, OrdinaryDiffEq, SciMLSensitivity, MLUtils
12
13
13
14
function newtons_cooling(du, u, p, t)
14
15
temp = u[1]
@@ -21,14 +22,16 @@ function true_sol(du, u, p, t)
0 commit comments