Skip to content

Commit 237debc

Browse files
committed
Update BLIS extension to use libflame for factorization and BLIS for solve
- Changed extension to use libflame for getrf (factorization) operations - Uses BLIS for getrs (solve) operations, maintaining the BLIS/FLAME integration goal - Updated Project.toml to include libflame_jll as dependency - Updated documentation to reflect libflame usage - Extension now uses: libflame factorization + BLIS solve operations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent ee8c3a2 commit 237debc

File tree

5 files changed

+39
-33
lines changed

5 files changed

+39
-33
lines changed

Project.toml

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,12 @@ SciMLOperators = "c0aeaf25-5076-4817-a8d5-81caf7dfa961"
2626
Setfield = "efcf1570-3423-57d1-acb7-fd33fddbac46"
2727
StaticArraysCore = "1e83bf80-4336-4d27-bf5d-d5a4f845583c"
2828
UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
29+
blis_jll = "6136c539-28a5-5bf0-87cc-b183200dce32"
30+
libflame_jll = "8e9d65e3-b2b8-5a9c-baa2-617b4576f0b9"
2931

3032
[weakdeps]
3133
BandedMatrices = "aae01518-5342-5314-be14-df237901396f"
3234
BlockDiagonals = "0a1fb500-61f7-11e9-3c65-f5ef3456f9f0"
33-
blis_jll = "6136c539-28a5-5bf0-87cc-b183200dce32"
3435
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
3536
CUDSS = "45b445bb-4962-46a0-9369-b4df9d0f772e"
3637
EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
@@ -48,7 +49,7 @@ SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
4849
Sparspak = "e56a9233-b9d6-4f03-8d0f-1825330902ac"
4950

5051
[extensions]
51-
LinearSolveBLISExt = "blis_jll"
52+
LinearSolveBLISExt = ["blis_jll", "libflame_jll"]
5253
LinearSolveBandedMatricesExt = "BandedMatrices"
5354
LinearSolveBlockDiagonalsExt = "BlockDiagonals"
5455
LinearSolveCUDAExt = "CUDA"
@@ -72,16 +73,15 @@ AllocCheck = "0.2"
7273
Aqua = "0.8"
7374
ArrayInterface = "7.7"
7475
BandedMatrices = "1.5"
75-
blis_jll = "0.9.0"
7676
BlockDiagonals = "0.1.42, 0.2"
7777
CUDA = "5"
7878
CUDSS = "0.1, 0.2, 0.3, 0.4"
7979
ChainRulesCore = "1.22"
8080
ConcreteStructs = "0.2.3"
8181
DocStringExtensions = "0.9.3"
8282
EnumX = "1.0.4"
83-
ExplicitImports = "1"
8483
EnzymeCore = "0.8.1"
84+
ExplicitImports = "1"
8585
FastAlmostBandedMatrices = "0.1"
8686
FastLapackInterface = "2"
8787
FiniteDiff = "2.22"
@@ -121,15 +121,16 @@ StaticArraysCore = "1.4.2"
121121
Test = "1"
122122
UnPack = "1"
123123
Zygote = "0.7"
124+
blis_jll = "0.9.0"
124125
julia = "1.10"
126+
libflame_jll = "5.2.0"
125127

126128
[extras]
127129
AllocCheck = "9b6a8646-10ed-4001-bbdc-1d2f46dfbb1a"
128130
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
129-
ExplicitImports = "7d51a73a-1435-4ff3-83d9-f097790105c7"
130131
BandedMatrices = "aae01518-5342-5314-be14-df237901396f"
131132
BlockDiagonals = "0a1fb500-61f7-11e9-3c65-f5ef3456f9f0"
132-
blis_jll = "6136c539-28a5-5bf0-87cc-b183200dce32"
133+
ExplicitImports = "7d51a73a-1435-4ff3-83d9-f097790105c7"
133134
FastAlmostBandedMatrices = "9d29842c-ecb8-4973-b1e9-a27b1157504e"
134135
FastLapackInterface = "29a986be-02c6-4525-aec4-84b980013641"
135136
FiniteDiff = "6a86dc24-6348-571c-b903-95158fe2bd41"
@@ -154,6 +155,8 @@ StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
154155
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
155156
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
156157
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
158+
blis_jll = "6136c539-28a5-5bf0-87cc-b183200dce32"
159+
libflame_jll = "8e9d65e3-b2b8-5a9c-baa2-617b4576f0b9"
157160

158161
[targets]
159-
test = ["Aqua", "Test", "IterativeSolvers", "InteractiveUtils", "KrylovKit", "KrylovPreconditioners", "Pkg", "Random", "SafeTestsets", "MultiFloats", "ForwardDiff", "HYPRE", "MPI", "BlockDiagonals", "FiniteDiff", "BandedMatrices", "blis_jll", "FastAlmostBandedMatrices", "StaticArrays", "AllocCheck", "StableRNGs", "Zygote", "RecursiveFactorization", "Sparspak", "FastLapackInterface", "SparseArrays", "ExplicitImports"]
162+
test = ["Aqua", "Test", "IterativeSolvers", "InteractiveUtils", "KrylovKit", "KrylovPreconditioners", "Pkg", "Random", "SafeTestsets", "MultiFloats", "ForwardDiff", "HYPRE", "MPI", "BlockDiagonals", "FiniteDiff", "BandedMatrices", "blis_jll", "libflame_jll", "FastAlmostBandedMatrices", "StaticArrays", "AllocCheck", "StableRNGs", "Zygote", "RecursiveFactorization", "Sparspak", "FastLapackInterface", "SparseArrays", "ExplicitImports"]

docs/src/solvers/solvers.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ the best choices, with SVD being the slowest but most precise.
1717
For efficiency, `RFLUFactorization` is the fastest for dense LU-factorizations until around
1818
150x150 matrices, though this can be dependent on the exact details of the hardware. After this
1919
point, `MKLLUFactorization` is usually faster on most hardware. `BLISLUFactorization` provides
20-
another high-performance option that combines optimized BLAS operations with stable LAPACK routines.
20+
another high-performance option that combines optimized BLAS operations from BLIS with optimized LAPACK routines from libflame.
2121
Note that on Mac computers that `AppleAccelerateLUFactorization` is generally always the fastest.
2222
`LUFactorization` will use your base system BLAS which can be fast or slow depending on the hardware
2323
configuration. `SimpleLUFactorization` will be fast only on very small matrices but can cut down on
@@ -191,8 +191,9 @@ MKLLUFactorization
191191

192192
!!! note
193193

194-
Using this solver requires that the package blis_jll is available. The solver will
195-
be automatically available when blis_jll is loaded, i.e., `using blis_jll`.
194+
Using this solver requires that both blis_jll and libflame_jll packages are available.
195+
The solver will be automatically available when both packages are loaded, i.e.,
196+
`using blis_jll, libflame_jll`.
196197

197198
```@docs
198199
BLISLUFactorization

ext/LinearSolveBLISExt.jl

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,30 @@ LinearSolveBLISExt
33
44
Extension module that provides BLIS (BLAS-like Library Instantiation Software) integration
55
for LinearSolve.jl. This extension combines BLIS for optimized BLAS operations with
6-
reference LAPACK for LAPACK operations, providing a high-performance yet stable linear
7-
algebra backend.
6+
libflame for optimized LAPACK operations, providing a fully optimized linear algebra
7+
backend.
88
99
Key features:
1010
- Uses BLIS for BLAS operations (matrix multiplication, etc.)
11-
- Uses reference LAPACK for LAPACK operations (LU factorization, solve, etc.)
11+
- Uses libflame for LAPACK operations (LU factorization, solve, etc.)
1212
- Supports all standard numeric types (Float32/64, ComplexF32/64)
1313
- Follows MKL-style ccall patterns for consistency
1414
"""
1515
module LinearSolveBLISExt
1616

1717
using Libdl
1818
using blis_jll
19-
using LAPACK_jll
19+
using libflame_jll
2020
using LinearAlgebra
2121
using LinearSolve
2222

23-
using LinearAlgebra: BlasInt, LU
23+
using LinearAlgebra: BlasInt, LU, libblastrampoline
2424
using LinearAlgebra.LAPACK: require_one_based_indexing, chkfinite, chkstride1,
2525
@blasfunc, chkargsok
2626
using LinearSolve: ArrayInterface, BLISLUFactorization, @get_cacheval, LinearCache, SciMLBase, do_factorization
2727

2828
const global libblis = blis_jll.blis
29-
const global liblapack = LAPACK_jll.liblapack_path
29+
const global libflame = libflame_jll.libflame
3030

3131
"""
3232
LinearSolve.do_factorization(alg::BLISLUFactorization, A, b, u)
@@ -54,7 +54,7 @@ function getrf!(A::AbstractMatrix{<:ComplexF64};
5454
if isempty(ipiv)
5555
ipiv = similar(A, BlasInt, min(size(A, 1), size(A, 2)))
5656
end
57-
ccall((@blasfunc(zgetrf_), liblapack), Cvoid,
57+
ccall(("zgetrf_", libflame), Cvoid,
5858
(Ref{BlasInt}, Ref{BlasInt}, Ptr{ComplexF64},
5959
Ref{BlasInt}, Ptr{BlasInt}, Ptr{BlasInt}),
6060
m, n, A, lda, ipiv, info)
@@ -74,7 +74,7 @@ function getrf!(A::AbstractMatrix{<:ComplexF32};
7474
if isempty(ipiv)
7575
ipiv = similar(A, BlasInt, min(size(A, 1), size(A, 2)))
7676
end
77-
ccall((@blasfunc(cgetrf_), liblapack), Cvoid,
77+
ccall(("cgetrf_", libflame), Cvoid,
7878
(Ref{BlasInt}, Ref{BlasInt}, Ptr{ComplexF32},
7979
Ref{BlasInt}, Ptr{BlasInt}, Ptr{BlasInt}),
8080
m, n, A, lda, ipiv, info)
@@ -94,7 +94,7 @@ function getrf!(A::AbstractMatrix{<:Float64};
9494
if isempty(ipiv)
9595
ipiv = similar(A, BlasInt, min(size(A, 1), size(A, 2)))
9696
end
97-
ccall((@blasfunc(dgetrf_), liblapack), Cvoid,
97+
ccall(("dgetrf_", libflame), Cvoid,
9898
(Ref{BlasInt}, Ref{BlasInt}, Ptr{Float64},
9999
Ref{BlasInt}, Ptr{BlasInt}, Ptr{BlasInt}),
100100
m, n, A, lda, ipiv, info)
@@ -114,7 +114,7 @@ function getrf!(A::AbstractMatrix{<:Float32};
114114
if isempty(ipiv)
115115
ipiv = similar(A, BlasInt, min(size(A, 1), size(A, 2)))
116116
end
117-
ccall((@blasfunc(sgetrf_), liblapack), Cvoid,
117+
ccall(("sgetrf_", libflame), Cvoid,
118118
(Ref{BlasInt}, Ref{BlasInt}, Ptr{Float32},
119119
Ref{BlasInt}, Ptr{BlasInt}, Ptr{BlasInt}),
120120
m, n, A, lda, ipiv, info)
@@ -138,7 +138,7 @@ function getrs!(trans::AbstractChar,
138138
throw(DimensionMismatch("ipiv has length $(length(ipiv)), but needs to be $n"))
139139
end
140140
nrhs = size(B, 2)
141-
ccall(("zgetrs_", liblapack), Cvoid,
141+
ccall((@blasfunc(zgetrs_), libblis), Cvoid,
142142
(Ref{UInt8}, Ref{BlasInt}, Ref{BlasInt}, Ptr{ComplexF64}, Ref{BlasInt},
143143
Ptr{BlasInt}, Ptr{ComplexF64}, Ref{BlasInt}, Ptr{BlasInt}, Clong),
144144
trans, n, size(B, 2), A, max(1, stride(A, 2)), ipiv, B, max(1, stride(B, 2)), info,
@@ -163,7 +163,7 @@ function getrs!(trans::AbstractChar,
163163
throw(DimensionMismatch("ipiv has length $(length(ipiv)), but needs to be $n"))
164164
end
165165
nrhs = size(B, 2)
166-
ccall(("cgetrs_", liblapack), Cvoid,
166+
ccall((@blasfunc(cgetrs_), libblis), Cvoid,
167167
(Ref{UInt8}, Ref{BlasInt}, Ref{BlasInt}, Ptr{ComplexF32}, Ref{BlasInt},
168168
Ptr{BlasInt}, Ptr{ComplexF32}, Ref{BlasInt}, Ptr{BlasInt}, Clong),
169169
trans, n, size(B, 2), A, max(1, stride(A, 2)), ipiv, B, max(1, stride(B, 2)), info,
@@ -188,7 +188,7 @@ function getrs!(trans::AbstractChar,
188188
throw(DimensionMismatch("ipiv has length $(length(ipiv)), but needs to be $n"))
189189
end
190190
nrhs = size(B, 2)
191-
ccall(("dgetrs_", liblapack), Cvoid,
191+
ccall((@blasfunc(dgetrs_), libblis), Cvoid,
192192
(Ref{UInt8}, Ref{BlasInt}, Ref{BlasInt}, Ptr{Float64}, Ref{BlasInt},
193193
Ptr{BlasInt}, Ptr{Float64}, Ref{BlasInt}, Ptr{BlasInt}, Clong),
194194
trans, n, size(B, 2), A, max(1, stride(A, 2)), ipiv, B, max(1, stride(B, 2)), info,
@@ -213,7 +213,7 @@ function getrs!(trans::AbstractChar,
213213
throw(DimensionMismatch("ipiv has length $(length(ipiv)), but needs to be $n"))
214214
end
215215
nrhs = size(B, 2)
216-
ccall(("sgetrs_", liblapack), Cvoid,
216+
ccall((@blasfunc(sgetrs_), libblis), Cvoid,
217217
(Ref{UInt8}, Ref{BlasInt}, Ref{BlasInt}, Ptr{Float32}, Ref{BlasInt},
218218
Ptr{BlasInt}, Ptr{Float32}, Ref{BlasInt}, Ptr{BlasInt}, Clong),
219219
trans, n, size(B, 2), A, max(1, stride(A, 2)), ipiv, B, max(1, stride(B, 2)), info,

src/extension_algs.jl

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -446,18 +446,20 @@ BLISLUFactorization()
446446
```
447447
448448
A wrapper over BLIS (BLAS-like Library Instantiation Software) for high-performance
449-
BLAS operations combined with reference LAPACK for stability. This provides optimized
450-
linear algebra operations while maintaining numerical accuracy and broad compatibility.
449+
BLAS operations combined with libflame for optimized LAPACK operations. This provides
450+
a fully optimized linear algebra stack with both high-performance BLAS and LAPACK routines.
451451
452452
BLIS provides highly optimized BLAS routines that can outperform reference BLAS
453-
implementations, especially for certain matrix sizes and operations. The integration
454-
uses BLIS for BLAS operations (like matrix multiplication) and falls back to reference
455-
LAPACK for LAPACK operations (like LU factorization and solve).
453+
implementations, especially for certain matrix sizes and operations. libflame provides
454+
optimized LAPACK operations that complement BLIS. The integration uses BLIS for BLAS
455+
operations (like matrix multiplication) and libflame for LAPACK operations (like LU
456+
factorization and solve).
456457
457458
!!! note
458459
459-
Using this solver requires that the package blis_jll is available. The solver will
460-
be automatically available when blis_jll is loaded, i.e., `using blis_jll`.
460+
Using this solver requires that both blis_jll and libflame_jll packages are available.
461+
The solver will be automatically available when both packages are loaded, i.e.,
462+
`using blis_jll, libflame_jll`.
461463
462464
## Performance Characteristics
463465
@@ -468,7 +470,7 @@ LAPACK for LAPACK operations (like LU factorization and solve).
468470
## Example
469471
470472
```julia
471-
using LinearSolve, blis_jll
473+
using LinearSolve, blis_jll, libflame_jll
472474
A = rand(100, 100)
473475
b = rand(100)
474476
prob = LinearProblem(A, b)

test/basictests.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ using IterativeSolvers, KrylovKit, MKL_jll, KrylovPreconditioners
44
using Test
55

66
# Import JLL packages for extensions
7-
using blis_jll
7+
using blis_jll, libflame_jll
88
import Random
99

1010
const Dual64 = ForwardDiff.Dual{Nothing, Float64, 1}

0 commit comments

Comments
 (0)