-
Notifications
You must be signed in to change notification settings - Fork 230
Compute JVP in line searches #1210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
this one also has conflicts for the same reason as the AD pr I suppseo |
e601dc9 to
33f43b0
Compare
|
I see you're hitting all the wrappers |
2e8626a to
28b8899
Compare
| elseif !NLSolversBase.isfinite_value(d) | ||
| TerminationCode.ObjectiveNotFinite | ||
| elseif !NLSolversBase.isfinite_gradient(d) | ||
| TerminationCode.GradientNotFinite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pkofod a random bug I came across when fixing these lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if I had that f_calls there for a reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is not changed, it's just moved to NLSolversBase to avoid having to expose jvp(obj) (IMO such calls are quite unsafe in general, so I'd like to avoid having to introduce new ones at least).
My point's just that currently the termination code is a GradientNotFinite when the objective function is not finite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the part of about the wrong code but had missed the part about the new function . Great
| initial_x = copy(initial_x) | ||
| retract!(method.manifold, initial_x) | ||
|
|
||
| value_gradient!!(d, initial_x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pkofod why would we ever want to call the !! methods explicitly? If the input is different from the one that the value and gradient were evaluated with the value/gradient will be recomputed anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to create new instances of OnceDifferentiable, but the "problem" is Fminbox that has outer and inner loops. In the previous inner loop you may evaluate the objective at x but then Fminbox updates a parameter that changes the function essentially and then the stored value is not correct.
Without getting too specific here, you can thing of it as the EBEs in Laplace's method. If you have an outer objective that performs an EBE optimization on the inside then a OnceDifferentiable for the outside objective of the marginal loglikelihood cannot know if we changes the inner EBEs or not.
As I said above, maybe a better approach is to construct a new OnceDifferentiable per outer iteration.
I believe other users have relied on it in the past as well, because they are doing the same. Allocating / constructing the OnceDifferntiable once and then they're solving a sequence of optimize with some parameter that's updated between optimize calls. I'm not completely sure if it's documented or not, but the idea is that each initial evaluation and each reset! for Fminbox forces an evaluation even if x is not updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous inner loop you may evaluate the objective at x but then Fminbox updates a parameter that changes the function essentially and then the stored value is not correct.
It sounds like we should just use NLSolversBase.clear! instead of only resetting the number of calls in the loop of the algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we can call clear before value!, gradient! etc is called in those places
| bw.Fb = value(bw.b, x) | ||
| bw.Ftotal = bw.mu * bw.Fb | ||
| NLSolversBase.value(obj::BarrierWrapper) = obj.Ftotal | ||
| function NLSolversBase.value!(bw::BarrierWrapper, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check CI yet, but I think these may call failures because the multiplier could have been updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can rewrite these to check if mu was updated in the same manner...
170a47b to
aa176a1
Compare
| results = optimize(dfbox, x, _optimizer, options, state) | ||
| stopped_by_callback = results.stopped_by.callback | ||
| dfbox.obj.f_calls[1] = 0 | ||
| # TODO: Reset everything? Add upstream API for resetting call counters? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed above, clear! should suffice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't but maybe I did something wrong. I spent a day trying to fix the errors but then decided to revert back to not touching any of the !! etc. functionality in this PR. The problem - and why I wanted to get rid of them in the first place - is that the JVP function messes up any of such implicit assumptions. There's no guarantee anymore that during/after the line search the gradient will be available or that a new gradient is used at all in the line search. IMO one should (at some point, but maybe not in this PR) get rid of the value(d) etc. API and any such implicit assumptions completely. If a specific gradient etc. has to be passed, it should be passed explicitly; and IMO the objective function structs should be treated as black boxes that just perform some optimisations by caching evaluations.
6525ce4 to
6d92ce1
Compare
b5c9710 to
6394bf3
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1210 +/- ##
==========================================
- Coverage 86.79% 86.36% -0.44%
==========================================
Files 44 44
Lines 3552 3594 +42
==========================================
+ Hits 3083 3104 +21
- Misses 469 490 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Locally (macOS aarch64) and on MacOS (aarch64) in CI all tests pass, but on ubuntu and windows 4 tests fail... |
473d75c to
a5a9636
Compare
cc9f818 to
536eaa9
Compare
Benchmark Results (Julia vlts)Time benchmarks
Memory benchmarks
A plot of the benchmark results has been uploaded as an artifact at . |
Benchmark Results (Julia v1)Time benchmarks
Memory benchmarks
A plot of the benchmark results has been uploaded as an artifact at . |
|
runs locally, so I think it's just one of those numerical things seen earlier |
|
Yes, these test issues are architecture specific and have persisted for a long time in this PR: #1210 (comment) |
|
As I mentioned earlier, they are generally run with "tuned" step sizes / learning parameters as well. I think the tests are not appropriate, really. There's not rule that for the given input the specified "accuracy" should be met. I need to rethink this sometime in the future. I turned off the rosenbrock test for AGD for now |
|
I suppose the JET errors only shows up after the combined change to initial_state as well as upgrading to v8 of NLSolvesrBase? |
No, I don't think the I pushed 9fa280e, a minimal set of changes that fixed the JET errors locally. |
Based on JuliaNLSolvers/NLSolversBase.jl#168 and JuliaNLSolvers/LineSearches.jl#187.
The diff would be cleaner if #1195 (which depends on #1209) would be available on the master branch.Based on #1212.
Currently, tests are failing due to missing definitions of
NLSolversBase.value_jvp!etc. forManifoldObjective.