Skip to content

Conversation

dmbates
Copy link
Collaborator

@dmbates dmbates commented Mar 30, 2025

  • Allow for blocks on the diagonal of L to be TriangularRFP from https://github.com/JuliaLinearAlgebra/RectangularFullPacked.jl
  • In the process of making these changes I found that I needed to be more explicit about when a diagonal block is being treated as Hermitian and stored in the lower triangle (copyscaleinflate! and rankUpdate!) and when it is being lower triangular (after cholUnblocked! and in the rdiv! call).
  • I had previously been playing somewhat fast and loose with these distinctions and just passing around Matrix objects for the dense case but that can trip you up.
  • The timings shown below indicate that there is a considerable speed penalty for TriangularRFP. This is more of a slowdown in the rankUpdate! method, which involves many calls to setindex!, rather than the cholUnblocked! method.
  • It may be possible to speed up some of the calls to setindex! because of the way the loops in the rankUpdate! work.
  • There is a new named argument, RFPthreshold, in the LinearMixedModel constructor (but not yet in MixedModel).
  • Right now this has a [ci skip] designation because I haven't updated all the tests to the more stringent requirements on Hermitian and Triangular types.
  • Tests should now pass (but apparently there are failures with some configurations).
  • Fixes Potential issue with rankUpdate! dosctring #813

@dmbates
Copy link
Collaborator Author

dmbates commented Mar 30, 2025

Some timings on my laptop

(jl_6i3bqi) pkg> st
Status `/private/var/folders/d9/t94kd5m50rl1r3bchncn62sm0000gn/T/jl_6i3bqi/Project.toml`
  [ff71e718] MixedModels v4.34.0

julia> dat = MixedModels.dataset(:insteval)
Arrow.Table with 73421 rows, 7 columns, and schema:
 :s        String
 :d        String
 :dept     String
 :studage  String
 :lectage  String
 :service  String
 :y        Int8

julia> f = @formula(y ~ 1 + service + (1|s) + (1|d) + (1|dept));

julia> m1 = fit(MixedModel, f, dat; progress=false);

julia> @be objective(updateL!($m1))
Benchmark: 20 samples with 1 evaluation
 min    5.143 ms (38 allocs: 1.094 KiB)
 median 5.172 ms (38 allocs: 1.094 KiB)
 mean   5.176 ms (38 allocs: 1.094 KiB)
 max    5.270 ms (38 allocs: 1.094 KiB)

# using the db/RFP branch

julia> m1 = fit(MixedModel, f, dat; progress=false);

julia> BlockDescription(m1)
rows:      s             d            dept         fixed     
2972:   Diagonal   
1128:    Sparse      Diag/TrRFP  
  14:    Dense      Sparse/Dense   Diag/Dense  
   3:    Dense         Dense         Dense         Dense     


julia> @be objective(updateL!($m1))
Benchmark: 4 samples with 1 evaluation
        26.432 ms (31 allocs: 768 bytes)
        26.525 ms (31 allocs: 768 bytes)
        26.669 ms (31 allocs: 768 bytes)
        26.937 ms (31 allocs: 768 bytes)

julia> m2 = LinearMixedModel(f, dat; RFPthreshold=2000);

julia> BlockDescription(m2)
rows:      s             d            dept         fixed     
2972:   Diagonal   
1128:    Sparse      Diag/Dense  
  14:    Dense      Sparse/Dense   Diag/Dense  
   3:    Dense         Dense         Dense         Dense     


julia> @be objective(updateL!($m2))
Benchmark: 20 samples with 1 evaluation
 min    5.144 ms (28 allocs: 656 bytes)
 median 5.173 ms (28 allocs: 656 bytes)
 mean   5.173 ms (28 allocs: 656 bytes)
 max    5.256 ms (28 allocs: 656 bytes)

@dmbates
Copy link
Collaborator Author

dmbates commented Apr 1, 2025

There may be a way of cutting down on the cost of the rankUpdate! operation that evaluates L[2,2] - L[2,1] * L[2,1] , which accounts for the largest amount of time spend in updateL! when TriangularRFP is used.
The bottleneck is not evaluating each update; it is writing it back to the TriangularRFP matrix with setindex!. Because we are storing the lower triangle in the RFP format, we could use linear indexing to the underlying matrix for the first half of the columns of L[2,2]. The [i,j] entry in L[2,2] is exactly the [i,j] entry of L[2,2].parent, if the size of L[2,2] is odd, or the [i + 1, j] entry of L[2,2].parent, if the size is even.

The position of the elements in the lower right triangle of L[2,2] is more complicated to determine but we could either code that up or just fall back on the existing setindex! method. Furthermore, we could sort the rows of L[2,1] and L[2,2] by descending order of A[2,2].diag so that more of the updates are in the left half of L[2,2].

Does this seem worth trying, @palday @ajinkya-k?

I am willing to work on it and feel that it could be done relatively quickly but experience suggests that my "relatively quickly" may not be as quick as I imagine.

@ajinkya-k
Copy link
Contributor

seems worth trying for sure.

@palday
Copy link
Member

palday commented Apr 2, 2025

I agree it's worth trying!

@ajinkya-k
Copy link
Contributor

ajinkya-k commented Apr 2, 2025

Isn’t there an SYRK for rectangular full packed format?

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@ajinkya-k
Copy link
Contributor

ajinkya-k commented Apr 3, 2025

Sorry I totally misunderstood the code (https://github.com/JuliaLinearAlgebra/RectangularFullPacked.jl/blob/b159bfabd981c46b8842ff1ad857aa95f6b52cad/src/hermitian.jl#L43)

The setindex slowness is only for the case where the argument A is a SparseMatrixCSC correct?

@dmbates
Copy link
Collaborator Author

dmbates commented Apr 3, 2025

@ajinkya-k I should have been more clear that I was speaking of the case where L[2,1] is sparse. Using RFP for L[2,2] only makes sense if L[2,2] is very large and, by design, L[2,1] is even larger (because L[1,1] is as large as possible). If L[2,1] needs to be stored as a dense matrix you have already got a much bigger problem to contend with.

@ajinkya-k
Copy link
Contributor

No I think that was clear but I twisted myself into knots trying to read the file changes from the PR

Copy link

codecov bot commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 84.76821% with 23 lines in your changes missing coverage. Please review.

Project coverage is 96.73%. Comparing base (35747ac) to head (09ad439).

Files with missing lines Patch % Lines
src/remat.jl 54.54% 20 Missing ⚠️
src/linalg/rankUpdate.jl 91.66% 2 Missing ⚠️
src/pca.jl 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #821      +/-   ##
==========================================
- Coverage   97.33%   96.73%   -0.60%     
==========================================
  Files          36       36              
  Lines        3495     3588      +93     
==========================================
+ Hits         3402     3471      +69     
- Misses         93      117      +24     
Flag Coverage Δ
current 96.37% <84.10%> (-0.62%) ⬇️
minimum 96.73% <84.76%> (-0.60%) ⬇️
nightly 96.33% <84.66%> (-0.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajinkya-k
Copy link
Contributor

@dmbates can you add a "fixes #813" to the PR description so that the issue is automatically closed when this is merged?

dmbates and others added 2 commits April 9, 2025 14:41
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@dmbates
Copy link
Collaborator Author

dmbates commented Apr 10, 2025

So I have a question about strategy. The bottleneck when using TriangularRFP for (usually) the [2,2] block of L which implies that the [2,1] block will be a SparseMatrixCSC is the rankUpdate! operation. This is slow because every assignment like C[i,j] += A[j, k] * A[i, k] has to go through a complex calculation of mapping the virtual indices [i,j] to the Cartesian indices in C.data.

When it is the lower triangle of C being stored in C.data (and C.transa is false, which is always the case for us) the first half of the columns of C are either in their original positions in C.data or shifted down by one position.

I did a calculation of the number of updates in this nonflipped configuration versus the number in the flipped configuration for the insteval model. It was about 3/4 nonflipped and 1/4 flipped, which is roughly the proportion you would expect based on the sizes of the chunks in C.

(980444, 306131)

However, if we reorder the rows of A by decreasing row sums

julia> MixedModels.countinds(L[p, :])
(1221833, 64742)

we get about 95% non-flipped and that certainly seems worth it. So, at what point should the levels of the second grouping factor be reordered?

I think it will be rare to need to use RFP but, if L[2,1] is a SparseMatrixCSC there wouldn't be much harm in reordering the rows (i.e. permute the levels of the second grouping factor).

Is this explanation sufficiently intelligible to be able to offer an opinion?

@palday
Copy link
Member

palday commented Apr 11, 2025

@dmbates I think that change makes sense. I also wonder if this would slightly speed up things for the non RFP case --- I've certainly seen this type of optimization have big impacts in other code.

If I'm thinking about this correctly, ordering by row-sums works out to ordering by something like "strength of crossing", right?

We don't make any guarantees about the internal ordering of levels, so that's not an issue.

@dmbates
Copy link
Collaborator Author

dmbates commented Apr 11, 2025

For the time being I will do the simplified indexing part and not do the reordering of levels part. However, we can recommend reordering the levels before creating the model.

L[diagind] = LowerTriangular(Ldi)
end
end
return identity.(A), identity.(L) # does anyone remember what the `identity` is for?`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmbates
Copy link
Collaborator Author

dmbates commented Apr 12, 2025

I wrote the enclosed to do the reordering of levels in a CategoricalArray by incidence. (I'm not a big fan of CategoricalArrays.jl but it seemed the easiest way to accomplish this.)

using CategoricalArrays, DataFrames, MixedModels

"""
    relevel_by_incidence!(v::CategoricalArray; rev=false)

Return `v` with the levels ordered by increasing (decreasing when `rev=true`) incidence.
"""
function relevel_by_incidence!(v::CategoricalArray; rev=false)
    ra = refarray(droplevels!(v))
    counts = zeros(Int, maximum(ra))
    for i in ra
        counts[i] += 1
    end
    levels!(v, levels(v)[sortperm(counts; rev)])
    return v
end

dat = DataFrame(MixedModels.dataset(:insteval))
dat.d = relevel_by_incidence!(categorical(dat.d); rev=true)
m = LinearMixedModel(@formula(y ~ 1 + service + (1|s) + (1|d) + zerocorr(1+service|dept)), dat)
m.A[3].diag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential issue with rankUpdate! dosctring
4 participants