Skip to content

Conversation

@dmbates
Copy link
Contributor

@dmbates dmbates commented Mar 6, 2024

@palday I have found that there is a substantial difference in the number of iterations required to converge on a simple GLMM in Appendix C, when I used the PRIMA.jl versions of BOBYQA and NEWUOA. While experimenting with PRIMA I also decided to abandon the approach of using rowtables and fall back on MatrixTable to work with tables of vectors of the same type. (As a person who taught factorial experimental design I should know better than to change multiple factors in the experiment but it seemed to work out.)

Along the way I also switched to using Lilith Hafner's Chairmarks.jl for benchmarking. It is cleaner and easier to use than BenchmarkTools. I'm not sure about the pun in the name but I guess I can live with it.

I realize this is a lot of changes to absorb. I guess I would check out this branch and render or preview the appendix then check out the main branch, render and compare. However, you may have better workflows.

@dmbates dmbates marked this pull request as draft March 6, 2024 20:24
@kliegl
Copy link
Collaborator

kliegl commented Mar 6, 2024

This? "Chairmarks... has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support."

@dmbates
Copy link
Contributor Author

dmbates commented Mar 6, 2024

This? "Chairmarks... has more reliable back support. Back support is a defining feature of chairs while benches are known to sometimes lack back support."

Exactly.

Comment on lines +210 to +215
@inbounds for i in eachindex(y, η, μ, dev)
ηi = η[i]
expmη = exp(-ηi)
μ[i] = inv(one(T) + expmη)
dev[i] = 2 * ((one(T) - y[i]) * ηi + log1p(expmη))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of the use of concrete eltypes and equal length vectors, each row is a constant distance away in memory, which means that the CPU prefetch may still work quite well -- you grab adjacent chunks of memory and load them into the L1/L2 cache and then you're grabbing elements out of that cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague recollection that the @inbounds here is redundant. That is, using eachindex essentially shuts down bounds checking on its arguments. Am I misremembering?

Previously I had tried to work with a row-table (vector of NamedTuples of floats) but it just got too complicated and it wasn't as effective as it should be. I keep thinking that I should transpose the underlying matrix so that the memory accesses in this loop are adjacent but I'm not sure if that optimization would be visible to the compiler. In this orientation the evaluation of the deviance is extremely fast so maybe it is best to just leave it alone.

aGHQ.qmd Outdated
```
The IRLS algorithm has converged in 4 iterations to essentially the same deviance as the general optimizer achieved after around 500 function evaluations.
The IRLS algorithm has converged in 4 iterations to essentially the same deviance as the general optimizer achieved after more than 150 function evaluations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should be grabbing this value from the code and rounding to something stable instead of manually updating.

[noblock]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference here is that PRIMA declares convergence in fewer iterations than does NLopt. Nevertheless, I take your point and will see about squirreling away the number of iterations to convergence.

@palday
Copy link
Member

palday commented May 14, 2025

@dmbates I merged in main and things seem to work. How do you feel about getting this merged? We can always iterate in future PRs, but I'm trying to clean up our perfectionist false start PRs a little. (I'm just as bad as you on that front, I must admit.)

@palday
Copy link
Member

palday commented May 14, 2025

Hmmm, maybe not ... I'm getting stack overflows during the optimization steps. Trying to figure out what went wrong...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants