Add new backends with DifferentiationInterface.jl#302
Add new backends with DifferentiationInterface.jl#302amontoison wants to merge 2 commits intomainfrom
Conversation
tmigot
left a comment
There was a problem hiding this comment.
Thanks @amontoison for the PR. I have mixed feelings with it. On one side it is progress, on the other side we are loosing Hessian backend for Enzyme and Zygote.
How far are we from making it fully compatible?
We can but only for unconstrained problems. The user will no longer be able to use an incorrect Hessian, which is better for everyone. |
f782802 to
8740d3a
Compare
|
@gdalle May I ask you to check what I did wrong in the file |
|
It looks like the problem comes from forgetting to import the function |
|
@dpo could you perhaps give me acces to the repo so that I may help with this and future PRs? |
|
|
||
| The backend information is in a structure [`ADNLPModels.ADModelBackend`](@ref) in the attribute `adbackend` of a `ADNLPModel`, it can also be accessed with [`get_adbackend`](@ref). | ||
|
|
||
| The functions used internally to define the NLPModel API and the possible backends are defined in the following table: |
There was a problem hiding this comment.
Why not just switch fully to the ADTypes specification? You're gonna run into trouble translating symbols into AbstractADType objects
There was a problem hiding this comment.
And the symbols don't allow you to set parameters like
- the number of chunks in ForwardDiff
- the tape compilation in ReverseDiff
- aspects of the mode in Enzyme
| $\mathcal{L}(x)$ denotes the Lagrangian $f(x) + \lambda^T c(x)$. | ||
| Except for the backends based on `ForwardDiff.jl` and `ReverseDiff.jl`, all other backends require the associated AD package to be manually installed by the user to work. | ||
| Note that the Jacobians and Hessians computed by the backends above are dense. | ||
| The backends `SparseADJacobian`, `SparseADHessian`, and `SparseReverseADHessian` should be used instead if sparse Jacobians and Hessians are required. |
There was a problem hiding this comment.
Same remark for sparse AD, using AutoSparse seems more flexible?
| (:ZygoteADGradient , :AutoZygote ), | ||
| # (:ForwardDiffADGradient , :AutoForwardDiff ), | ||
| # (:ReverseDiffADGradient , :AutoReverseDiff ), | ||
| (:MooncakeADGradient , :AutoMooncake ), |
There was a problem hiding this comment.
The AutoMooncake constructor requires a keyword, like so:
AutoMooncake(; config=nothing)| (:DiffractorADGradient , :AutoDiffractor ), | ||
| (:TrackerADGradient , :AutoTracker ), | ||
| (:SymbolicsADGradient , :AutoSymbolics ), | ||
| (:ChainRulesADGradient , :AutoChainRules ), |
There was a problem hiding this comment.
The AutoChainRules constructor requires a keyword, like so:
AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig())| (:ChainRulesADGradient , :AutoChainRules ), | ||
| (:FastDifferentiationADGradient , :AutoFastDifferentiation ), | ||
| (:FiniteDiffADGradient , :AutoFiniteDiff ), | ||
| (:FiniteDifferencesADGradient , :AutoFiniteDifferences ), |
There was a problem hiding this comment.
The AutoFiniteDifferences constructor requires a keyword, like so:
AutoFiniteDifferences(; fdm=FiniteDifferences.central_fdm(3, 1))| x0::AbstractVector = rand(nvar), | ||
| kwargs..., | ||
| ) | ||
| backend = $fbackend() |
There was a problem hiding this comment.
This will fail for the three backends mentioned above. And for all other backends, this prevents you from setting any options, which was the goal of ADTypes.jl to begin with: see https://github.com/SciML/ADTypes.jl?tab=readme-ov-file#why-should-ad-users-adopt-this-standard
| for (ADHvprod, fbackend) in ((:EnzymeADHvprod , :AutoEnzyme ), | ||
| (:ZygoteADHvprod , :AutoZygote ), | ||
| # (:ForwardDiffADHvprod , :AutoForwardDiff ), | ||
| # (:ReverseDiffADHvprod , :AutoReverseDiff ), | ||
| (:MooncakeADHvprod , :AutoMooncake ), | ||
| (:DiffractorADHvprod , :AutoDiffractor ), | ||
| (:TrackerADHvprod , :AutoTracker ), | ||
| (:SymbolicsADHvprod , :AutoSymbolics ), | ||
| (:ChainRulesADHvprod , :AutoChainRules ), | ||
| (:FastDifferentiationADHvprod , :AutoFastDifferentiation ), | ||
| (:FiniteDiffADHvprod , :AutoFiniteDiff ), | ||
| (:FiniteDifferencesADHvprod , :AutoFiniteDifferences ), | ||
| (:PolyesterForwardDiffADHvprod, :AutoPolyesterForwardDiff)) |
There was a problem hiding this comment.
Diffractor, Mooncake, Tracker and ChainRules probably don't work in second order.
FiniteDiff and FiniteDifferences might give you inaccurate results depending on their configuration (JuliaDiff/DifferentiationInterface.jl#78)
| end | ||
| end | ||
|
|
||
| for (ADHessian, fbackend) in ((:EnzymeADHessian , :AutoEnzyme ), |
There was a problem hiding this comment.
Same remark as for HVP about backends incompatible with second order
| ForwardDiff_backend = Dict( | ||
| :gradient_backend => ForwardDiffADGradient, | ||
| :jprod_backend => ForwardDiffADJprod, | ||
| :jtprod_backend => ForwardDiffADJtprod, | ||
| :hprod_backend => ForwardDiffADHvprod, | ||
| :jacobian_backend => ForwardDiffADJacobian, | ||
| :hessian_backend => ForwardDiffADHessian, | ||
| :ghjvprod_backend => EmptyADbackend, | ||
| :jprod_residual_backend => ForwardDiffADJprod, | ||
| :jtprod_residual_backend => ForwardDiffADJtprod, | ||
| :hprod_residual_backend => ForwardDiffADHvprod, | ||
| :jacobian_residual_backend => ForwardDiffADJacobian, | ||
| :hessian_residual_backend => ForwardDiffADHessian | ||
| ) |
There was a problem hiding this comment.
The goal of DifferentiationInterface is to save a lot people a lot code. However, this PR ends up adding more lines than it removes, precisely because of this kind of disjunction.
@VaibhavDixit2 how did you handle the choice of backend for each operator in OptimizationBase.jl?
| @@ -0,0 +1,12 @@ | |||
| using OptimizationProblems | |||
|
|
|||
There was a problem hiding this comment.
| using NLPModels |
|
@gdalle I invited you. Thank you for your work here!!! |
|
@amontoison what do you think about moving away from symbols here? |
It depends on the alternatives, Right now, it's useful to specify that we want optimized backends with It will be easier to provide an |
|
If I'm not mistaken there are two levels here:
Right now you base all of the internal representations on |
|
Do you have an example of what you suggest? |
|
I could try to show you in an alternative PR |
|
Okay it is a bit hard to submit a PR since there would be a lot of things to rewrite and I don't understand what each part does. But essentially I was imagining something like this: using ADTypes
using DifferentiationInterface
using LinearAlgebra
using SparseMatrixColorings
using SparseConnectivityTracer
import ForwardDiff, ReverseDiff
function DefaultAutoSparse(backend::AbstractADType)
return AutoSparse(
backend;
sparsity_detector=TracerSparsityDetector(),
coloring_algorithm=GreedyColoringAlgorithm(),
)
end
struct ADModelBackend
gradient_backend
hprod_backend
jprod_backend
jtprod_backend
jacobian_backend
hessian_backend
end
struct ADModelBackendPrep
gradient_prep
hprod_prep
jprod_prep
jtprod_prep
jacobian_prep
hessian_prep
end
function ADModelBackend(forward_backend::AbstractADType, reverse_backend::AbstractADType)
@assert ADTypes.mode(forward_backend) isa
Union{ADTypes.ForwardMode,ADTypes.ForwardOrReverseMode}
@assert ADTypes.mode(reverse_backend) isa
Union{ADTypes.ReverseMode,ADTypes.ForwardOrReverseMode}
gradient_backend = reverse_backend
hprod_backend = SecondOrder(forward_backend, reverse_backend)
jprod_backend = forward_backend
jtprod_backend = reverse_backend
jacobian_backend = DefaultAutoSparse(forward_backend) # or a size-dependent heuristic
hessian_backend = DefaultAutoSparse(SecondOrder(forward_backend, reverse_backend))
return ADModelBackend(
gradient_backend,
hprod_backend,
jprod_backend,
jtprod_backend,
jacobian_backend,
hessian_backend,
)
end
function ADModelBackendPrep(
admodel_backend::ADModelBackend,
obj::Function,
cons::Function,
lag::Function,
x::AbstractVector,
)
(;
gradient_backend,
hprod_backend,
jprod_backend,
jtprod_backend,
jacobian_backend,
hessian_backend,
) = admodel_backend
c = cons(x)
λ = similar(c)
dx = similar(x)
dc = similar(c)
gradient_prep = prepare_gradient(lag, gradient_backend, x, Constant(λ))
hprod_prep = prepare_hvp(lag, hprod_backend, x, (dx,), Constant(λ))
jprod_prep = prepare_pushforward(cons, jprod_backend, x, (dx,))
jtprod_prep = prepare_pullback(cons, jtprod_backend, x, (dc,))
jacobian_prep = prepare_jacobian(cons, jacobian_backend, x)
hessian_prep = prepare_hessian(lag, hessian_backend, x, Constant(λ))
return ADModelBackendPrep(
gradient_prep, hprod_prep, jprod_prep, jtprod_prep, jacobian_prep, hessian_prep
)
end
admodel_backend = ADModelBackend(AutoForwardDiff(), AutoReverseDiff())
obj(x) = sum(x)
cons(x) = abs.(x)
lag(x, λ) = obj(x) + dot(λ, cons(x))
admodel_backend_prep = ADModelBackendPrep(admodel_backend, obj, cons, lag, rand(3)); |
Add the following backends: