-
Notifications
You must be signed in to change notification settings - Fork 49
[WIP] Switch to unconstrained optimization followed by "rectifying" θ #838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@palday I'm thinking of changing the optsum structure so that the initial parameter vector and the final parameter vector (after rectifying) and the value of the objective for them are always present in the fitlog. For cases where the optional argument Then we could use code to extract the iniital parameter vector, the initial objective value, the final parameter vector, etc. instead of storing these quantities as fields in the struct. |
As I said there is an enormous amount of yak shaving that still needs to take place - profiling the objective and glmms are still broken. I imagine glmms will be easier to patch up. At this point I wanted to have a target out there for people to throw beer bottles at. I am mostly concerned about getting the tests on lmm's to work. |
I'm fully onboard with this and had thought about something similar a while back. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #838 +/- ##
===========================================
- Coverage 97.34% 74.48% -22.87%
===========================================
Files 36 35 -1
Lines 3504 3515 +11
===========================================
- Hits 3411 2618 -793
- Misses 93 897 +804
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Those failures are, well, distressing. The simplest imaginable model for the dyestuff data cannot produce consistent results to within a reasonable tolerance on macOS M-series processors and Intel processors with Linux. I'll look at the fitlogs with M-series under OpenBLAS and AppleAccelerate and on x86_64 Linux with MKL and OpenBLAS. The differences show up in the intermediate results (pwrss, varest, logdet) used to evaluate the objective, not so much in the objective itself or the value of the parameter. Lots of fun times ahead. |
There are interesting issues here, even for a very simple model like dyestuff. Having preached this for several years I should be aware that the appropriate scale on which to assess a variance estimate, or a quantity like pwrss which is proportional to the variance estimate, is on the logarithm scale. I have gotten slightly different answers on the dyestuff ML fit on Apple M-series and x86_64 systems even though the initial evaluations seem identical. Looks like a fun few weeks tracking all this down. |
These findings make me wonder if it wouldn't be worthwhile to break down the changes into smaller components. Specifically, to start with just change the lower bounds to |
I have been been testing this a bit and managed to hit what I believe is libprima/PRIMA.jl#25 when running within Quarto on Linux. Hence, I'd be in favor of waiting with the switch to PRIMA until the linked issue has been reoslved. |
@andreasnoack Which unconstrained, derivative-free optimizer implementations in Julia would you and/or @pkofod recommend? I favored PRIMA.jl because libprima was a fairly recent implementation of these algorithms and, apparently, addressed some bugs in Powell's Fortran-77 code that was translated via f2c to provide the implementation in libnlopt. However, switching to the unconstrained formulation may give us the opportunity to explore other optimizers with less of an implementation burden, just because we don't need bounded or constrained optimization. The biggest payoff, of course, would be if we could use automatic differentiation on the objective but I have not been successful in doing so. With AD we would be able to use gradient-based optimization that, we hope, would be more stable. |
Short term, I think MixedModels would have to stick with the current solver options: nlopt and prima. With my current understanding, the tradeoffs are
I think libprima/PRIMA.jl#25 should be considered a blocker. It means that you get a hard error when running prima in VSCode on Linux. Short term that leaves us with nlopt. I think it would be worth trying a minimal change where we just set the lower bounds to Longer term, I think it would be worthwhile working towards a gradient based solution. #705 already includes a version that works with ForwardDiff. It allocates a lot but that can probably be reduced a lot with some effort. Maybe some or all of the derivates can also be derived. We might be able to help with this if you think that is worthwhile. With gradients, I think we could just use Optim's BFGS and thereby avoid binary dependencies. |
Well you have convinced me that an incremental approach is superior to my "smash it to bits and hope that you can reconstruct it to some semblance of its former capability" way of going about things. So I will start another branch that will add the |
So I tried to follow the advice from @andreasnoack to simply change the I will close this PR and open a new one when I have something that passes tests. |
Interestingly the switch to unconstrained optimization and NEWUAO resulted in a substantially better fit for the goldstein example goldstein: Test Failed at /Users/dmbates/.julia/dev/MixedModels/test/pirls.jl:255
Expression: ≈(deviance(m1), 191.25588670286234, rtol = 1.0e-5)
Evaluated: 186.9984948799336 ≈ 191.25588670286234 (rtol=1.0e-5) |
The good news on the julia> fit!(m1; progress=false, fitlog=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
y ~ 1 + (1 | group)
Distribution: Poisson{Float64}
Link: LogLink()
logLik deviance AIC AICc BIC
-312.2084 186.6941 628.4169 628.5406 633.6272
Variance components:
Column VarianceStd.Dev.
group (Intercept) 4.81046 2.19328
Number of obs: 100; levels of grouping factors: 10
Fixed-effects parameters:
────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────
(Intercept) 3.27886 0.696336 4.71 <1e-05
────────────────────────────────────────────────
julia> m1.optsum.fitlog
98-element Vector{Tuple{Vector{Float64}, Float64}}:
([4.727210823648169, 1.0], 246.12019299931703)
([5.727210823648169, 1.0], 295.5056699061577)
([4.727210823648169, 2.0], 196.8746435802723)
([3.7272108236481687, 1.0], 216.2616884426193)
([4.727210823648169, 0.0], 33111.0741495393)
([4.726003407293028, 1.5014977472972564], 208.10891357748682)
([4.85183579870062, 1.9995124490551461], 198.21508829451847)
([4.511522151674384, 1.875080484978622], 196.24076491440593)
([4.426769665723302, 1.8220049150324438], 196.09184733152927)
([4.3500810851653195, 1.7578265572161065], 196.24230483914891)
([4.340214009424053, 1.8720860328574915], 194.54773051094907)
([4.251003875403487, 1.9172695695498838], 193.23200525943656)
([4.052132455437064, 1.938486504308199], 191.43172828960573)
([3.661649822146293, 2.0252235384879804], 188.50724146238238)
([3.103676226035871, 2.174289986604922], 246.12019299931703)
([3.622362433235004, 1.9332643025595915], 188.72622091796856)
([3.783332000872945, 1.7633413467337788], 191.00299702303116)
([3.576379365323183, 2.0774628837478337], 187.9321149944938)
([3.478557229144678, 2.098219320664461], 187.47849269173358)
([3.2954279432571782, 2.1786162400726985], 186.7509476816972)
([2.8991357663384987, 2.232953248759725], 246.12019299931703)
([3.2477013862868622, 2.2664921512188094], 246.12019299931703)
⋮
([3.278864613581048, 2.193275512161594], 246.12019299931703)
([3.278864613587998, 2.1932755121364043], 246.12019299931703)
([3.278864613601011, 2.1932755121507164], 246.12019299931703)
([3.278864613589542, 2.1932755121491434], 246.12019299931703)
([3.2788646135969723, 2.1932755121419714], 246.12019299931703)
([3.2788646135961073, 2.1932755121428116], 246.12019299931703)
([3.2788646135956885, 2.1932755121411334], 246.12019299931703)
([3.2788646135968866, 2.193275512140986], 246.12019299931703)
([3.278864613595962, 2.1932755121419363], 246.12019299931703)
([3.278864613596395, 2.1932755121417515], 246.12019299931703)
([3.2788646135964012, 2.193275512142066], 246.12019299931703)
([3.2788646135964847, 2.193275512141855], 246.12019299931703)
([3.27886461359629, 2.1932755121418825], 246.12019299931703)
([3.278864613596393, 2.1932755121419505], 246.12019299931703)
([3.2788646135963804, 2.1932755121418483], 246.12019299931703)
([3.2788646135963804, 2.19327551214186], 246.12019299931703)
([3.278864613596395, 2.193275512141851], 246.12019299931703)
([3.2788646135963866, 2.1932755121418412], 246.12019299931703)
([3.2788646135963875, 2.1932755121418555], 246.12019299931703)
([3.2788646135963875, 2.193275512141855], 246.12019299931703)
([3.2788646135963853, 2.19327551214185], 186.69409940885922) For the time being I will disable that test. The data are very weird and came from an example that is extremely difficult to fit. |
I think the version should be bumped to 5.0.0 rather than 4.39.0. That is, I think this change would be a major release, or am I overthinking it? |
This is definitely a major version bump, but don't tag a release just yet -- while we're making breaking changes, I'm going to fix a few things that we have flagged for our next breaking release (including removing a few deprecated kwargs). |
I believe that this was largely superseded by #840 so I'm going to go ahead and close this. |
PRIMA.newuoa
, instead of optimizers that allow for bounds, like the various versions ofBOBYQA
that we were using. See Termination at saddle point #705 for an example.NLopt.jl
toPRIMA.jl
.NLopt.jl
optimizers so I have temporarily disabled that code.