[wIP] Allow TriangularRFP for diagonal blocks of L [ci skip] #821

dmbates · 2025-03-30T17:16:22Z

Allow for blocks on the diagonal of L to be TriangularRFP from https://github.com/JuliaLinearAlgebra/RectangularFullPacked.jl
In the process of making these changes I found that I needed to be more explicit about when a diagonal block is being treated as Hermitian and stored in the lower triangle (copyscaleinflate! and rankUpdate!) and when it is being lower triangular (after cholUnblocked! and in the rdiv! call).
I had previously been playing somewhat fast and loose with these distinctions and just passing around Matrix objects for the dense case but that can trip you up.
The timings shown below indicate that there is a considerable speed penalty for TriangularRFP. This is more of a slowdown in the rankUpdate! method, which involves many calls to setindex!, rather than the cholUnblocked! method.
It may be possible to speed up some of the calls to setindex! because of the way the loops in the rankUpdate! work.
There is a new named argument, RFPthreshold, in the LinearMixedModel constructor (but not yet in MixedModel).
Right now this has a [ci skip] designation because I haven't updated all the tests to the more stringent requirements on Hermitian and Triangular types.
Tests should now pass (but apparently there are failures with some configurations).
Fixes Potential issue with rankUpdate! dosctring #813

dmbates · 2025-03-30T17:18:37Z

Some timings on my laptop

(jl_6i3bqi) pkg> st
Status `/private/var/folders/d9/t94kd5m50rl1r3bchncn62sm0000gn/T/jl_6i3bqi/Project.toml`
  [ff71e718] MixedModels v4.34.0

julia> dat = MixedModels.dataset(:insteval)
Arrow.Table with 73421 rows, 7 columns, and schema:
 :s        String
 :d        String
 :dept     String
 :studage  String
 :lectage  String
 :service  String
 :y        Int8

julia> f = @formula(y ~ 1 + service + (1|s) + (1|d) + (1|dept));

julia> m1 = fit(MixedModel, f, dat; progress=false);

julia> @be objective(updateL!($m1))
Benchmark: 20 samples with 1 evaluation
 min    5.143 ms (38 allocs: 1.094 KiB)
 median 5.172 ms (38 allocs: 1.094 KiB)
 mean   5.176 ms (38 allocs: 1.094 KiB)
 max    5.270 ms (38 allocs: 1.094 KiB)

# using the db/RFP branch

julia> m1 = fit(MixedModel, f, dat; progress=false);

julia> BlockDescription(m1)
rows:      s             d            dept         fixed     
2972:   Diagonal   
1128:    Sparse      Diag/TrRFP  
  14:    Dense      Sparse/Dense   Diag/Dense  
   3:    Dense         Dense         Dense         Dense     


julia> @be objective(updateL!($m1))
Benchmark: 4 samples with 1 evaluation
        26.432 ms (31 allocs: 768 bytes)
        26.525 ms (31 allocs: 768 bytes)
        26.669 ms (31 allocs: 768 bytes)
        26.937 ms (31 allocs: 768 bytes)

julia> m2 = LinearMixedModel(f, dat; RFPthreshold=2000);

julia> BlockDescription(m2)
rows:      s             d            dept         fixed     
2972:   Diagonal   
1128:    Sparse      Diag/Dense  
  14:    Dense      Sparse/Dense   Diag/Dense  
   3:    Dense         Dense         Dense         Dense     


julia> @be objective(updateL!($m2))
Benchmark: 20 samples with 1 evaluation
 min    5.144 ms (28 allocs: 656 bytes)
 median 5.173 ms (28 allocs: 656 bytes)
 mean   5.173 ms (28 allocs: 656 bytes)
 max    5.256 ms (28 allocs: 656 bytes)

dmbates · 2025-04-01T15:19:17Z

There may be a way of cutting down on the cost of the rankUpdate! operation that evaluates L[2,2] - L[2,1] * L[2,1] , which accounts for the largest amount of time spend in updateL! when TriangularRFP is used.
The bottleneck is not evaluating each update; it is writing it back to the TriangularRFP matrix with setindex!. Because we are storing the lower triangle in the RFP format, we could use linear indexing to the underlying matrix for the first half of the columns of L[2,2]. The [i,j] entry in L[2,2] is exactly the [i,j] entry of L[2,2].parent, if the size of L[2,2] is odd, or the [i + 1, j] entry of L[2,2].parent, if the size is even.

The position of the elements in the lower right triangle of L[2,2] is more complicated to determine but we could either code that up or just fall back on the existing setindex! method. Furthermore, we could sort the rows of L[2,1] and L[2,2] by descending order of A[2,2].diag so that more of the updates are in the left half of L[2,2].

Does this seem worth trying, @palday @ajinkya-k?

I am willing to work on it and feel that it could be done relatively quickly but experience suggests that my "relatively quickly" may not be as quick as I imagine.

ajinkya-k · 2025-04-01T15:27:29Z

seems worth trying for sure.

palday · 2025-04-02T03:15:37Z

I agree it's worth trying!

ajinkya-k · 2025-04-02T03:16:44Z

~~Isn’t there an SYRK for rectangular full packed format?~~

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ajinkya-k · 2025-04-03T12:30:12Z

Sorry I totally misunderstood the code (https://github.com/JuliaLinearAlgebra/RectangularFullPacked.jl/blob/b159bfabd981c46b8842ff1ad857aa95f6b52cad/src/hermitian.jl#L43)

The setindex slowness is only for the case where the argument A is a SparseMatrixCSC correct?

dmbates · 2025-04-03T14:40:58Z

@ajinkya-k I should have been more clear that I was speaking of the case where L[2,1] is sparse. Using RFP for L[2,2] only makes sense if L[2,2] is very large and, by design, L[2,1] is even larger (because L[1,1] is as large as possible). If L[2,1] needs to be stored as a dense matrix you have already got a much bigger problem to contend with.

ajinkya-k · 2025-04-03T15:24:35Z

No I think that was clear but I twisted myself into knots trying to read the file changes from the PR

codecov · 2025-04-09T18:49:23Z

Codecov Report

Attention: Patch coverage is 84.76821% with 23 lines in your changes missing coverage. Please review.

Project coverage is 96.73%. Comparing base (35747ac) to head (09ad439).

Files with missing lines	Patch %	Lines
src/remat.jl	54.54%	20 Missing ⚠️
src/linalg/rankUpdate.jl	91.66%	2 Missing ⚠️
src/pca.jl	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #821      +/-   ##
==========================================
- Coverage   97.33%   96.73%   -0.60%     
==========================================
  Files          36       36              
  Lines        3495     3588      +93     
==========================================
+ Hits         3402     3471      +69     
- Misses         93      117      +24

Flag	Coverage Δ
current	`96.37% <84.10%> (-0.62%)`	⬇️
minimum	`96.73% <84.76%> (-0.60%)`	⬇️
nightly	`96.33% <84.66%> (-0.60%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ajinkya-k · 2025-04-09T18:52:13Z

@dmbates can you add a "fixes #813" to the PR description so that the issue is automatically closed when this is merged?

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

dmbates · 2025-04-10T17:58:06Z

So I have a question about strategy. The bottleneck when using TriangularRFP for (usually) the [2,2] block of L which implies that the [2,1] block will be a SparseMatrixCSC is the rankUpdate! operation. This is slow because every assignment like C[i,j] += A[j, k] * A[i, k] has to go through a complex calculation of mapping the virtual indices [i,j] to the Cartesian indices in C.data.

When it is the lower triangle of C being stored in C.data (and C.transa is false, which is always the case for us) the first half of the columns of C are either in their original positions in C.data or shifted down by one position.

I did a calculation of the number of updates in this nonflipped configuration versus the number in the flipped configuration for the insteval model. It was about 3/4 nonflipped and 1/4 flipped, which is roughly the proportion you would expect based on the sizes of the chunks in C.

(980444, 306131)

However, if we reorder the rows of A by decreasing row sums

julia> MixedModels.countinds(L[p, :])
(1221833, 64742)

we get about 95% non-flipped and that certainly seems worth it. So, at what point should the levels of the second grouping factor be reordered?

I think it will be rare to need to use RFP but, if L[2,1] is a SparseMatrixCSC there wouldn't be much harm in reordering the rows (i.e. permute the levels of the second grouping factor).

Is this explanation sufficiently intelligible to be able to offer an opinion?

palday · 2025-04-11T02:54:29Z

@dmbates I think that change makes sense. I also wonder if this would slightly speed up things for the non RFP case --- I've certainly seen this type of optimization have big impacts in other code.

If I'm thinking about this correctly, ordering by row-sums works out to ordering by something like "strength of crossing", right?

We don't make any guarantees about the internal ordering of levels, so that's not an issue.

dmbates · 2025-04-11T15:34:48Z

For the time being I will do the simplified indexing part and not do the reordering of levels part. However, we can recommend reordering the levels before creating the model.

ararslan · 2025-04-11T22:41:54Z

src/linearmixedmodel.jl

+            L[diagind] = LowerTriangular(Ldi)
+        end
+    end
+    return identity.(A), identity.(L)  # does anyone remember what the `identity` is for?`


#772 (comment)

dmbates · 2025-04-12T21:00:15Z

I wrote the enclosed to do the reordering of levels in a CategoricalArray by incidence. (I'm not a big fan of CategoricalArrays.jl but it seemed the easiest way to accomplish this.)

using CategoricalArrays, DataFrames, MixedModels

"""
    relevel_by_incidence!(v::CategoricalArray; rev=false)

Return `v` with the levels ordered by increasing (decreasing when `rev=true`) incidence.
"""
function relevel_by_incidence!(v::CategoricalArray; rev=false)
    ra = refarray(droplevels!(v))
    counts = zeros(Int, maximum(ra))
    for i in ra
        counts[i] += 1
    end
    levels!(v, levels(v)[sortperm(counts; rev)])
    return v
end

dat = DataFrame(MixedModels.dataset(:insteval))
dat.d = relevel_by_incidence!(categorical(dat.d); rev=true)
m = LinearMixedModel(@formula(y ~ 1 + service + (1|s) + (1|d) + zerocorr(1+service|dept)), dat)
m.A[3].diag

dmbates added 2 commits March 22, 2025 10:46

Preliminary (non-working) version with RFP.

0b23679

[WIP] allow TriangularRFP for diagonal blocks of L [ci skip]

03015a7

dmbates added 2 commits March 30, 2025 13:31

Documentation correction per #813

4516cc3

Merge branch 'main' into db/RFP

c04c01d

format

ae1e292

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

dmbates added 2 commits April 8, 2025 09:49

Merge branch 'main' into db/RFP

a7c2f56

Adjustments for tests; reformat

2dedb3e

dmbates and others added 2 commits April 9, 2025 14:41

Enhance coverage

09ad439

BlueStyle

06ffbb3

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ararslan reviewed Apr 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wIP] Allow TriangularRFP for diagonal blocks of L [ci skip] #821

[wIP] Allow TriangularRFP for diagonal blocks of L [ci skip] #821

Uh oh!

dmbates commented Mar 30, 2025 •

edited

Loading

Uh oh!

dmbates commented Mar 30, 2025

Uh oh!

dmbates commented Apr 1, 2025

Uh oh!

ajinkya-k commented Apr 1, 2025

Uh oh!

palday commented Apr 2, 2025

Uh oh!

ajinkya-k commented Apr 2, 2025 •

edited

Loading

Uh oh!

ajinkya-k commented Apr 3, 2025 •

edited

Loading

Uh oh!

dmbates commented Apr 3, 2025

Uh oh!

ajinkya-k commented Apr 3, 2025

Uh oh!

codecov bot commented Apr 9, 2025 •

edited

Loading

Uh oh!

ajinkya-k commented Apr 9, 2025

Uh oh!

dmbates commented Apr 10, 2025

Uh oh!

palday commented Apr 11, 2025

Uh oh!

dmbates commented Apr 11, 2025

Uh oh!

ararslan Apr 11, 2025

Uh oh!

dmbates commented Apr 12, 2025

Uh oh!

Uh oh!

[wIP] Allow TriangularRFP for diagonal blocks of L [ci skip] #821

Are you sure you want to change the base?

[wIP] Allow TriangularRFP for diagonal blocks of L [ci skip] #821

Uh oh!

Conversation

dmbates commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmbates commented Mar 30, 2025

Uh oh!

dmbates commented Apr 1, 2025

Uh oh!

ajinkya-k commented Apr 1, 2025

Uh oh!

palday commented Apr 2, 2025

Uh oh!

ajinkya-k commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajinkya-k commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmbates commented Apr 3, 2025

Uh oh!

ajinkya-k commented Apr 3, 2025

Uh oh!

codecov bot commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ajinkya-k commented Apr 9, 2025

Uh oh!

dmbates commented Apr 10, 2025

Uh oh!

palday commented Apr 11, 2025

Uh oh!

dmbates commented Apr 11, 2025

Uh oh!

ararslan Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

dmbates commented Apr 12, 2025

Uh oh!

Uh oh!

dmbates commented Mar 30, 2025 •

edited

Loading

ajinkya-k commented Apr 2, 2025 •

edited

Loading

ajinkya-k commented Apr 3, 2025 •

edited

Loading

codecov bot commented Apr 9, 2025 •

edited

Loading