Skip to content

Conversation

dmbates
Copy link
Collaborator

@dmbates dmbates commented Jul 8, 2025

  • We recently realized that the constraints on some elements of the θ parameter were not needed. The objective is defined consistently when one or more of the parameters that appear on the diagonal of the elements of λ are negative.
  • This means that we can use unconstrained optimizers, like PRIMA.newuoa, instead of optimizers that allow for bounds, like the various versions of BOBYQA that we were using. See Termination at saddle point #705 for an example.
  • In addition to changing optimizers we would need to either change the way that the estimated standard deviations of the random effects were calculated (because these cannot be negative) or add a post-optimization step of flipping the signs on any columns of any elements of λ that had a negative value on the diagonal. We opted for the second approach, which we call "rectifying" the parameter vector.
  • Because I have never learned the lesson of changing only one thing at a time I also decided to swtich the default optimizers from those in NLopt.jl to PRIMA.jl.
  • Then I found out that in the code for profiling the objective I had assumed some of the structure of the NLopt.jl optimizers so I have temporarily disabled that code.
  • I'm sure that new and exciting episodes of yak shaving will come to pass as we persist.

@dmbates dmbates marked this pull request as draft July 8, 2025 17:32
@dmbates
Copy link
Collaborator Author

dmbates commented Jul 8, 2025

@palday I'm thinking of changing the optsum structure so that the initial parameter vector and the final parameter vector (after rectifying) and the value of the objective for them are always present in the fitlog. For cases where the optional argument fitlog is false the evaluation number will be in a separate column. (An alternative is always to retain the fitlog. This comes at the expense of a couple of kiobytes of storage but I spend far too much time optimizing away small amounts of storage in objects.)

Then we could use code to extract the iniital parameter vector, the initial objective value, the final parameter vector, etc. instead of storing these quantities as fields in the struct.

@dmbates
Copy link
Collaborator Author

dmbates commented Jul 8, 2025

As I said there is an enormous amount of yak shaving that still needs to take place - profiling the objective and glmms are still broken. I imagine glmms will be easier to patch up.

At this point I wanted to have a target out there for people to throw beer bottles at. I am mostly concerned about getting the tests on lmm's to work.

@palday
Copy link
Member

palday commented Jul 8, 2025

@palday I'm thinking of changing the optsum structure so that the initial parameter vector and the final parameter vector (after rectifying) and the value of the objective for them are always present in the fitlog. For cases where the optional argument fitlog is false the evaluation number will be in a separate column. (An alternative is always to retain the fitlog. This comes at the expense of a couple of kiobytes of storage but I spend far too much time optimizing away small amounts of storage in objects.)

Then we could use code to extract the iniital parameter vector, the initial objective value, the final parameter vector, etc. instead of storing these quantities as fields in the struct.

I'm fully onboard with this and had thought about something similar a while back.

Copy link

codecov bot commented Jul 11, 2025

Codecov Report

Attention: Patch coverage is 86.36364% with 12 lines in your changes missing coverage. Please review.

Project coverage is 74.48%. Comparing base (418a65f) to head (bfe82e3).

Files with missing lines Patch % Lines
src/prima.jl 82.60% 12 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (418a65f) and HEAD (bfe82e3). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (418a65f) HEAD (bfe82e3)
minimum 2 0
nightly 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #838       +/-   ##
===========================================
- Coverage   97.34%   74.48%   -22.87%     
===========================================
  Files          36       35        -1     
  Lines        3504     3515       +11     
===========================================
- Hits         3411     2618      -793     
- Misses         93      897      +804     
Flag Coverage Δ
current 74.48% <86.36%> (?)
minimum ?
nightly ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dmbates
Copy link
Collaborator Author

dmbates commented Jul 12, 2025

Those failures are, well, distressing. The simplest imaginable model for the dyestuff data cannot produce consistent results to within a reasonable tolerance on macOS M-series processors and Intel processors with Linux. I'll look at the fitlogs with M-series under OpenBLAS and AppleAccelerate and on x86_64 Linux with MKL and OpenBLAS.

The differences show up in the intermediate results (pwrss, varest, logdet) used to evaluate the objective, not so much in the objective itself or the value of the parameter. Lots of fun times ahead.

@dmbates
Copy link
Collaborator Author

dmbates commented Jul 12, 2025

There are interesting issues here, even for a very simple model like dyestuff. Having preached this for several years I should be aware that the appropriate scale on which to assess a variance estimate, or a quantity like pwrss which is proportional to the variance estimate, is on the logarithm scale.

I have gotten slightly different answers on the dyestuff ML fit on Apple M-series and x86_64 systems even though the initial evaluations seem identical. Looks like a fun few weeks tracking all this down.

@andreasnoack
Copy link
Member

These findings make me wonder if it wouldn't be worthwhile to break down the changes into smaller components. Specifically, to start with just change the lower bounds to -Inf but stick with nlopt. Maybe the optimum is just quite sensitive to rounding differences but it might also be that PRIMA or the way the PRIME_jlls are compiled that makes a difference here.

@andreasnoack
Copy link
Member

I have been been testing this a bit and managed to hit what I believe is libprima/PRIMA.jl#25 when running within Quarto on Linux. Hence, I'd be in favor of waiting with the switch to PRIMA until the linked issue has been reoslved.

@dmbates
Copy link
Collaborator Author

dmbates commented Jul 14, 2025

@andreasnoack Which unconstrained, derivative-free optimizer implementations in Julia would you and/or @pkofod recommend? I favored PRIMA.jl because libprima was a fairly recent implementation of these algorithms and, apparently, addressed some bugs in Powell's Fortran-77 code that was translated via f2c to provide the implementation in libnlopt. However, switching to the unconstrained formulation may give us the opportunity to explore other optimizers with less of an implementation burden, just because we don't need bounded or constrained optimization.

The biggest payoff, of course, would be if we could use automatic differentiation on the objective but I have not been successful in doing so. With AD we would be able to use gradient-based optimization that, we hope, would be more stable.

@andreasnoack
Copy link
Member

andreasnoack commented Jul 15, 2025

Short term, I think MixedModels would have to stick with the current solver options: nlopt and prima. With my current understanding, the tradeoffs are

  • nlopt seems to have inherited some bugs from the original implementation that might be hard to fix because of the nature of the code. On the other hand, the library is better structured than prima for foreign language interfacing.
  • prima has bug fixes not available in nlopt but but the design is harder for foreign language interfacing. There are several issues that discuss improvements but they haven't yet been resolved.

I think libprima/PRIMA.jl#25 should be considered a blocker. It means that you get a hard error when running prima in VSCode on Linux. Short term that leaves us with nlopt. I think it would be worth trying a minimal change where we just set the lower bounds to -Inf and adjust the signs to see what happens. I actually tried that locally, but, as you pointed out, some surprising things seems to happen with the likelihood profiling code.

Longer term, I think it would be worthwhile working towards a gradient based solution. #705 already includes a version that works with ForwardDiff. It allocates a lot but that can probably be reduced a lot with some effort. Maybe some or all of the derivates can also be derived. We might be able to help with this if you think that is worthwhile. With gradients, I think we could just use Optim's BFGS and thereby avoid binary dependencies.

@dmbates
Copy link
Collaborator Author

dmbates commented Jul 15, 2025

Well you have convinced me that an incremental approach is superior to my "smash it to bits and hope that you can reconstruct it to some semblance of its former capability" way of going about things. So I will start another branch that will add the rectify! capability and, instead of setting the elements of lb to -Inf just remove the lb property entirely. All of the constrained or bounded optimizers have defaults for the bounds vectors such that if lb and ub are not specified they perform unconstrained optimization. I will also allow for additional optimizer specifications like :newuoa and :LB_NEWUOA.

@dmbates
Copy link
Collaborator Author

dmbates commented Jul 18, 2025

So I tried to follow the advice from @andreasnoack to simply change the lowerbdvector so that all elements are -Inf, keep using the bounded optimizers like BOBYQA and COBYLA, even though there are no bounds in effect, and rectify the final parameter vector. That is sort of the minimal change option. Unfortunately, even those minor changes blow up in the profiling. In the profiling you don't want to cross the lower bound of zero because the profiled objective ends up being symmetric about zero for the constrained theta's. What I will try to do is to modify the profileθj! code so that it doesn't check m.optsum.lowerbd[i] but instead looks at m.parmap[i][2] == m.parmap[i][3].

I will close this PR and open a new one when I have something that passes tests.

@dmbates
Copy link
Collaborator Author

dmbates commented Aug 13, 2025

Interestingly the switch to unconstrained optimization and NEWUAO resulted in a substantially better fit for the goldstein example

goldstein: Test Failed at /Users/dmbates/.julia/dev/MixedModels/test/pirls.jl:255
  Expression: (deviance(m1), 191.25588670286234, rtol = 1.0e-5)
   Evaluated: 186.9984948799336  191.25588670286234 (rtol=1.0e-5)

@dmbates
Copy link
Collaborator Author

dmbates commented Aug 13, 2025

The good news on the goldstein example may not be such good news. If you create a fitlog and examine it the deviance on the last evaluation looks suspect.

julia> fit!(m1; progress=false, fitlog=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  y ~ 1 + (1 | group)
  Distribution: Poisson{Float64}
  Link: LogLink()

   logLik    deviance     AIC       AICc        BIC    
  -312.2084   186.6941   628.4169   628.5406   633.6272

Variance components:
         Column   VarianceStd.Dev.
group (Intercept)  4.81046 2.19328

 Number of obs: 100; levels of grouping factors: 10

Fixed-effects parameters:
────────────────────────────────────────────────
               Coef.  Std. Error     z  Pr(>|z|)
────────────────────────────────────────────────
(Intercept)  3.27886    0.696336  4.71    <1e-05
────────────────────────────────────────────────

julia> m1.optsum.fitlog
98-element Vector{Tuple{Vector{Float64}, Float64}}:
 ([4.727210823648169, 1.0], 246.12019299931703)
 ([5.727210823648169, 1.0], 295.5056699061577)
 ([4.727210823648169, 2.0], 196.8746435802723)
 ([3.7272108236481687, 1.0], 216.2616884426193)
 ([4.727210823648169, 0.0], 33111.0741495393)
 ([4.726003407293028, 1.5014977472972564], 208.10891357748682)
 ([4.85183579870062, 1.9995124490551461], 198.21508829451847)
 ([4.511522151674384, 1.875080484978622], 196.24076491440593)
 ([4.426769665723302, 1.8220049150324438], 196.09184733152927)
 ([4.3500810851653195, 1.7578265572161065], 196.24230483914891)
 ([4.340214009424053, 1.8720860328574915], 194.54773051094907)
 ([4.251003875403487, 1.9172695695498838], 193.23200525943656)
 ([4.052132455437064, 1.938486504308199], 191.43172828960573)
 ([3.661649822146293, 2.0252235384879804], 188.50724146238238)
 ([3.103676226035871, 2.174289986604922], 246.12019299931703)
 ([3.622362433235004, 1.9332643025595915], 188.72622091796856)
 ([3.783332000872945, 1.7633413467337788], 191.00299702303116)
 ([3.576379365323183, 2.0774628837478337], 187.9321149944938)
 ([3.478557229144678, 2.098219320664461], 187.47849269173358)
 ([3.2954279432571782, 2.1786162400726985], 186.7509476816972)
 ([2.8991357663384987, 2.232953248759725], 246.12019299931703)
 ([3.2477013862868622, 2.2664921512188094], 246.12019299931703)
 
 ([3.278864613581048, 2.193275512161594], 246.12019299931703)
 ([3.278864613587998, 2.1932755121364043], 246.12019299931703)
 ([3.278864613601011, 2.1932755121507164], 246.12019299931703)
 ([3.278864613589542, 2.1932755121491434], 246.12019299931703)
 ([3.2788646135969723, 2.1932755121419714], 246.12019299931703)
 ([3.2788646135961073, 2.1932755121428116], 246.12019299931703)
 ([3.2788646135956885, 2.1932755121411334], 246.12019299931703)
 ([3.2788646135968866, 2.193275512140986], 246.12019299931703)
 ([3.278864613595962, 2.1932755121419363], 246.12019299931703)
 ([3.278864613596395, 2.1932755121417515], 246.12019299931703)
 ([3.2788646135964012, 2.193275512142066], 246.12019299931703)
 ([3.2788646135964847, 2.193275512141855], 246.12019299931703)
 ([3.27886461359629, 2.1932755121418825], 246.12019299931703)
 ([3.278864613596393, 2.1932755121419505], 246.12019299931703)
 ([3.2788646135963804, 2.1932755121418483], 246.12019299931703)
 ([3.2788646135963804, 2.19327551214186], 246.12019299931703)
 ([3.278864613596395, 2.193275512141851], 246.12019299931703)
 ([3.2788646135963866, 2.1932755121418412], 246.12019299931703)
 ([3.2788646135963875, 2.1932755121418555], 246.12019299931703)
 ([3.2788646135963875, 2.193275512141855], 246.12019299931703)
 ([3.2788646135963853, 2.19327551214185], 186.69409940885922)

For the time being I will disable that test. The data are very weird and came from an example that is extremely difficult to fit.

@dmbates
Copy link
Collaborator Author

dmbates commented Aug 13, 2025

I think the version should be bumped to 5.0.0 rather than 4.39.0. That is, I think this change would be a major release, or am I overthinking it?

@palday
Copy link
Member

palday commented Aug 13, 2025

I think the version should be bumped to 5.0.0 rather than 4.39.0. That is, I think this change would be a major release, or am I overthinking it?

This is definitely a major version bump, but don't tag a release just yet -- while we're making breaking changes, I'm going to fix a few things that we have flagged for our next breaking release (including removing a few deprecated kwargs).

@palday
Copy link
Member

palday commented Aug 21, 2025

I believe that this was largely superseded by #840 so I'm going to go ahead and close this.

@palday palday closed this Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants