-
Notifications
You must be signed in to change notification settings - Fork 3
added: hessian keyword argument in NonLinMPC
#275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hessian argument in NonLinMPChessian keyword argument in NonLinMPC
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #275 +/- ##
==========================================
+ Coverage 98.45% 98.48% +0.03%
==========================================
Files 28 28
Lines 4659 4764 +105
==========================================
+ Hits 4587 4692 +105
Misses 72 72 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark Results (Julia v1)Time benchmarks
Memory benchmarks
|
|
Hi @franckgaga,
|
|
Many thanks for the suggestions @gdalle. I tried to differentiate actual autodiff to decompression using
I benchmarked my case study with theses two options and it essentially changes nothing on the results, compared to
Alright thanks, I'm following it. The |
This doesn't look too bad, it's all blue and you're spending most of your time in the actual differentiation (gradients and hessians).
Here's the better implementation, which can be a thousand times faster (although it doesn't seem to impact your profiling). That's similar to what I'd like to do in SMC but I'm still figuring out API design for only filling one triangle. using LinearAlgebra, SparseArrays, StableRNGs
function copyto_lowertriangle_naive!(T::SparseMatrixCSC, A::SparseMatrixCSC)
for j in axes(A, 2)
for i in axes(A, 1)
if i >= j
T[i, j] = A[i, j]
end
end
end
return
end
function copyto_lowertriangle!(T::SparseMatrixCSC, A::SparseMatrixCSC)
@assert size(T) == size(A)
kT = 0
rvA, rvT = rowvals(A), rowvals(T)
nzA, nzT = nonzeros(A), nonzeros(T)
for j in axes(A, 2)
for kA in nzrange(A, j)
i = rvA[kA]
if i >= j
kT += 1
@assert i == rvT[kT]
nzT[kT] = nzA[kA]
end
end
end
return T
endBenchmark and test: julia> n, p = 1_000, 0.005;
julia> A = sparse(Symmetric(sprand(n, n, p)));
julia> T = similar(sparse(LowerTriangular(A)));
julia> copyto_lowertriangle!(T, A)
julia> T == LowerTriangular(A)
true
julia> using Chairmarks
julia> @b (T, A) copyto_lowertriangle_naive!(_[1], _[2])
7.881 ms
julia> @b (T, A) copyto_lowertriangle!(_[1], _[2])
12.958 μs |
|
Impressive gains! I just understood that That's probably why individual indexing of That being said, I just noticed that I filled all the entries of the lower-triangular part, and the docstring of |
|
Come to think of it, the destination matrix is not a I need to do some benchmarks to verify if this part is slow... |
|
Okay I did some more benchmarks. The indexing of I'm not sure why it did not appear on my first profile above, but I ran multiple profiling and now, systematically, a fair amount of time is spent in edit: maybe also @amontoison can answer my question. |
|
Maybe MOI does the coloring regardless of whether it actually has to compute a hessian? |
|
Are there other scalar nonlinear constraints in the model? Our AD probably needs to kick in to compute the Hessian for those parts (or the objective, etc) |
|
What's in the model:
|
|
I provide the Hessian/Hessian of the Langrangian function for all the nonlinear ingredients. I know that MOI still need to sum the 2 Hessians in the end, but AD should not be require for this, no ? It is also worth mentioning that the Hessian sparsity pattern is different in the |
The AD in MOI runs for this. We could probably look at improving the performance of |
The new keyword argument allows enabling and customizing the computation of the Hessian of the Langrangian function. Note that this is only supported with
oracle=trueoption. Changing the Hessian backend will throw an error iforacle=false.The default is
hessian=false, meaning Hessian is not computed, and the Quasi-newton approximation provided by the optimizer will be used (e.g. L-BFGS forIpopt.jl). The other options are:hessian=true: it will select an appropriate AD backend for the Hessian, which is:hessian=backend: usebackend::AbstractADTypefor the computation of the Hessian.Also, pretty-printing
NonLinMPCwill now show thehessianbackend: