|
1 |
| -# [Code Optimization for Solving Nonlinear Systems](@id code_optimization) |
| 1 | +# [Code Optimization for Small Nonlinear Systems in Julia](@id code_optimization) |
2 | 2 |
|
3 |
| -## Code Optimization in Julia |
| 3 | +## General Code Optimization in Julia |
4 | 4 |
|
5 | 5 | Before starting this tutorial, we recommend the reader to check out one of the
|
6 | 6 | many tutorials for optimization Julia code. The following is an incomplete
|
@@ -29,30 +29,145 @@ Let's start with small systems.
|
29 | 29 |
|
30 | 30 | ## Optimizing Nonlinear Solver Code for Small Systems
|
31 | 31 |
|
32 |
| -```@example |
33 |
| -using NonlinearSolve, StaticArrays |
| 32 | +Take for example a prototypical small nonlinear solver code in its out-of-place form: |
34 | 33 |
|
35 |
| -f(u, p) = u .* u .- p |
36 |
| -u0 = @SVector[1.0, 1.0] |
| 34 | +```@example small_opt |
| 35 | +using NonlinearSolve |
| 36 | +
|
| 37 | +function f(u, p) |
| 38 | + u .* u .- p |
| 39 | +end |
| 40 | +u0 = [1.0, 1.0] |
37 | 41 | p = 2.0
|
38 |
| -probN = NonlinearProblem(f, u0, p) |
39 |
| -sol = solve(probN, NewtonRaphson(), reltol = 1e-9) |
| 42 | +prob = NonlinearProblem(f, u0, p) |
| 43 | +sol = solve(prob, NewtonRaphson()) |
40 | 44 | ```
|
41 | 45 |
|
42 |
| -## Using Jacobian Free Newton Krylov (JNFK) Methods |
| 46 | +We can use BenchmarkTools.jl to get more precise timings: |
43 | 47 |
|
44 |
| -If we want to solve the first example, without constructing the entire Jacobian |
| 48 | +```@example small_opt |
| 49 | +using BenchmarkTools |
| 50 | +
|
| 51 | +@btime solve(prob, NewtonRaphson()) |
| 52 | +``` |
45 | 53 |
|
46 |
| -```@example |
47 |
| -using NonlinearSolve, LinearSolve |
| 54 | +Note that this way of writing the function is a shorthand for: |
48 | 55 |
|
49 |
| -function f!(res, u, p) |
50 |
| - @. res = u * u - p |
| 56 | +```@example small_opt |
| 57 | +function f(u, p) |
| 58 | + [u[1] * u[1] - p, u[2] * u[2] - p] |
51 | 59 | end
|
52 |
| -u0 = [1.0, 1.0] |
| 60 | +``` |
| 61 | + |
| 62 | +where the function `f` returns an array. This is a common pattern from things like MATLAB's `fzero` |
| 63 | +or SciPy's `scipy.optimize.fsolve`. However, by design it's very slow. I nthe benchmark you can see |
| 64 | +that there are many allocations. These allocations cause the memory allocator to take more time |
| 65 | +than the actual numerics itself, which is one of the reasons why codes from MATLAB and Python end up |
| 66 | +slow. |
| 67 | + |
| 68 | +In order to avoid this issue, we can use a non-allocating "in-place" approach. Written out by hand, |
| 69 | +this looks like: |
| 70 | + |
| 71 | +```@example small_opt |
| 72 | +function f(du, u, p) |
| 73 | + du[1] = u[1] * u[1] - p |
| 74 | + du[2] = u[2] * u[2] - p |
| 75 | + nothing |
| 76 | +end |
| 77 | +
|
| 78 | +prob = NonlinearProblem(f, u0, p) |
| 79 | +@btime sol = solve(prob, NewtonRaphson()) |
| 80 | +``` |
| 81 | + |
| 82 | +Notice how much faster this already runs! We can make this code even simpler by using |
| 83 | +the `.=` in-place broadcasting. |
| 84 | + |
| 85 | +```@example small_opt |
| 86 | +function f(du, u, p) |
| 87 | + du .= u .* u .- p |
| 88 | +end |
| 89 | +
|
| 90 | +@btime sol = solve(prob, NewtonRaphson()) |
| 91 | +``` |
| 92 | + |
| 93 | +## Further Optimizations for Small Nonlinear Solves with Static Arrays and SimpleNonlinearSolve |
| 94 | + |
| 95 | +Allocations are only expensive if they are “heap allocations”. For a more |
| 96 | +in-depth definition of heap allocations, |
| 97 | +[there are many sources online](http://net-informations.com/faq/net/stack-heap.htm). |
| 98 | +But a good working definition is that heap allocations are variable-sized slabs |
| 99 | +of memory which have to be pointed to, and this pointer indirection costs time. |
| 100 | +Additionally, the heap has to be managed, and the garbage controllers has to |
| 101 | +actively keep track of what's on the heap. |
| 102 | + |
| 103 | +However, there's an alternative to heap allocations, known as stack allocations. |
| 104 | +The stack is statically-sized (known at compile time) and thus its accesses are |
| 105 | +quick. Additionally, the exact block of memory is known in advance by the |
| 106 | +compiler, and thus re-using the memory is cheap. This means that allocating on |
| 107 | +the stack has essentially no cost! |
| 108 | + |
| 109 | +Arrays have to be heap allocated because their size (and thus the amount of |
| 110 | +memory they take up) is determined at runtime. But there are structures in |
| 111 | +Julia which are stack-allocated. `struct`s for example are stack-allocated |
| 112 | +“value-type”s. `Tuple`s are a stack-allocated collection. The most useful data |
| 113 | +structure for NonlinearSolve though is the `StaticArray` from the package |
| 114 | +[StaticArrays.jl](https://github.com/JuliaArrays/StaticArrays.jl). These arrays |
| 115 | +have their length determined at compile-time. They are created using macros |
| 116 | +attached to normal array expressions, for example: |
| 117 | + |
| 118 | +```@example small_opt |
| 119 | +using StaticArrays |
| 120 | +A = SA[2.0, 3.0, 5.0] |
| 121 | +typeof(A) # SVector{3, Float64} (alias for SArray{Tuple{3}, Float64, 1, 3}) |
| 122 | +``` |
| 123 | + |
| 124 | +Notice that the `3` after `SVector` gives the size of the `SVector`. It cannot |
| 125 | +be changed. Additionally, `SVector`s are immutable, so we have to create a new |
| 126 | +`SVector` to change values. But remember, we don't have to worry about |
| 127 | +allocations because this data structure is stack-allocated. `SArray`s have |
| 128 | +numerous extra optimizations as well: they have fast matrix multiplication, |
| 129 | +fast QR factorizations, etc. which directly make use of the information about |
| 130 | +the size of the array. Thus, when possible, they should be used. |
| 131 | + |
| 132 | +Unfortunately, static arrays can only be used for sufficiently small arrays. |
| 133 | +After a certain size, they are forced to heap allocate after some instructions |
| 134 | +and their compile time balloons. Thus, static arrays shouldn't be used if your |
| 135 | +system has more than ~20 variables. Additionally, only the native Julia |
| 136 | +algorithms can fully utilize static arrays. |
| 137 | + |
| 138 | +Let's ***optimize our nonlinear solve using static arrays***. Note that in this case, we |
| 139 | +want to use the out-of-place allocating form, but this time we want to output |
| 140 | +a static array. Doing it with broadcasting looks like: |
| 141 | + |
| 142 | +```@example small_opt |
| 143 | +function f_SA(u, p) |
| 144 | + u .* u .- p |
| 145 | +end |
| 146 | +u0 = SA[1.0, 1.0] |
53 | 147 | p = 2.0
|
54 |
| -prob = NonlinearProblem(f!, u0, p) |
| 148 | +prob = NonlinearProblem(f_SA, u0, p) |
| 149 | +@btime solve(prob, NewtonRaphson()) |
| 150 | +``` |
| 151 | + |
| 152 | +Note that only change here is that `u0` is made into a StaticArray! If we needed to write `f` out |
| 153 | +for a more complex nonlinear case, then we'd simply do the following: |
| 154 | + |
| 155 | +```@example small_opt |
| 156 | +function f_SA(u, p) |
| 157 | + SA[u[1] * u[1] - p, u[2] * u[2] - p] |
| 158 | +end |
55 | 159 |
|
56 |
| -linsolve = LinearSolve.KrylovJL_GMRES() |
57 |
| -sol = solve(prob, NewtonRaphson(; linsolve), reltol = 1e-9) |
| 160 | +@btime solve(prob, NewtonRaphson()) |
58 | 161 | ```
|
| 162 | + |
| 163 | +However, notice that this did not give us a speedup but rather a slowdown. This is because many of the |
| 164 | +methods in NonlinearSolve.jl are designed to scale to larger problems, and thus aggressively do things |
| 165 | +like caching which is good for large problems but not good for these smaller problems and static arrays. |
| 166 | +In order to see the full benefit, we need to move to one of the methods from SimpleNonlinearSolve.jl |
| 167 | +which are designed for these small-scale static examples. Let's now use `SimpleNewtonRaphson`: |
| 168 | + |
| 169 | +```@example small_opt |
| 170 | +@btime solve(prob, SimpleNewtonRaphson()) |
| 171 | +``` |
| 172 | + |
| 173 | +And there we go, around 100ns from our starting point of almost 6μs! |
0 commit comments