Skip to content

Comments

tinkering on gradient-based optimization#871

Draft
palday wants to merge 26 commits intomainfrom
db/pa/gradient
Draft

tinkering on gradient-based optimization#871
palday wants to merge 26 commits intomainfrom
db/pa/gradient

Conversation

@palday
Copy link
Member

@palday palday commented Dec 29, 2025

No description provided.

@dmbates
Copy link
Collaborator

dmbates commented Dec 29, 2025

Thank you for reorganizing my notes into a reasonable archive. Somehow a good git organization remains a "here be dragons" area for me.

@codecov
Copy link

codecov bot commented Dec 30, 2025

Codecov Report

❌ Patch coverage is 66.53846% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.75%. Comparing base (4e04cfd) to head (ca8d1a9).

Files with missing lines Patch % Lines
src/gradient.jl 64.04% 87 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #871      +/-   ##
==========================================
- Coverage   95.62%   93.75%   -1.88%     
==========================================
  Files          38       39       +1     
  Lines        3702     3954     +252     
==========================================
+ Hits         3540     3707     +167     
- Misses        162      247      +85     
Flag Coverage Δ
current 93.45% <66.27%> (-1.85%) ⬇️
minimum 93.70% <66.53%> (-1.87%) ⬇️
nightly 93.45% <66.27%> (-1.85%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dmbates
Copy link
Collaborator

dmbates commented Jan 14, 2026

I have added some code to evaluate the diagonal block of Omega-dot for each parameter component. That is in src/gradient.jl Some explanation of what is going on is in gradients/Gradient_evaluation.qmd. Next steps are to fill out the other blocks and to add methods for Omega_dot_diag_block! for diagonal blocks further down the diagonal. In most cases these are dense matrices, rather than Diagonal or UniformBlockDiagonal but only the diagonal or the diagonal blocks are overwritten at this stage.

I did add tests for the methods I added today.

@dmbates
Copy link
Collaborator

dmbates commented Jan 14, 2026

The failure on nightly is from an old test on the HTML output which apparently is not formatted the same way as for earlier versions. I'm not sure why Style Enforcer is failing.

@dmbates
Copy link
Collaborator

dmbates commented Jan 24, 2026

I added some preliminary code to evaluate the gradient using the blocked form of the derivative of $\Omega$ and the blocked L matrix. It is not polished but it works - sort of. I just updated the gradients/GradientEvaluation.qmd document with some examples using the blocked evaluation of $L^{-1}\dot{\Omega} L^{-T}$ and using the full matrix evaluation. These are compared to the finite-difference approximations. Somehow the blocked evaluation of $L^{-1}\dot{\Omega} L^{-T}$ is giving a different value of the second component of the gradient in the last example (penicillin) but I can't figure out why. It seems that the numbers used to evaluate this quantity are the same for the blocked evaluation and for the full-matrix evaluation. @palday If you can shed any light on this for me I would appreciate your doing so.

@dmbates
Copy link
Collaborator

dmbates commented Jan 25, 2026

I think I know what the problem is - I didn't zero out all the blocks that I should have prior to the blocked gradient evaluation. Should be fixed today.

@dmbates
Copy link
Collaborator

dmbates commented Jan 25, 2026

I think the initialization of the blocked form for evaluation of the gradient is now fixed. I must admit that I am uniquely skilled in confusing myself about the order in which to do such calculations but I think I have it now.

Still a WIP for the actual gradient evaluation. @palday if you have time I may ask for your advice on how to structure the code, which is kind-of all over the place right now.

I will run it through the formatter and commit that version to try to cut down on error messages from commits.

@dmbates
Copy link
Collaborator

dmbates commented Jan 26, 2026

The development of the gradient evaluation is currently in src/gradient.jl.

My next tasks will be to fold the initialize_blocks! function into eval_grad_p! and fold that into a function, perhaps called gradient!, to modify an element of a vector passed as the first argument. (Is gradient! too general a term? Should it be lmm_gradient! or something like that?)

I am still keeping the storage used in the gradient evaluation, produced with grad_blocks, outside the LinearMixedModel struct. It can be moved there but I haven't thought through how to do that as we don't want to allocate this storage unless we are going to use it. I am thinking of an optional Boolean argument named gradient on whether to allow for the evaluation of a gradient which will then allocate the grad blocks in the LinearMixedModel struct. This argument will also affect the default choice of optimizer.

After that, a lot of testing and timings. I fear that when all is said and done gains in evaluation times will depend strongly on the type(s) and size(s) of grouping factors in the model. Perhaps we will see a gain in reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants