Skip to content

Commit ee8c3a2

Browse files
committed
Add comprehensive documentation and finalize BLIS integration
- Add blis_jll as test dependency in Project.toml - Remove LAPACK_jll from test imports (not needed for user tests) - Add comprehensive docstring for BLISLUFactorization - Add module-level documentation for LinearSolveBLISExt - Add BLIS section to solver documentation - Include BLIS in recommended methods section - Add docstring for do_factorization method 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent 95d5c40 commit ee8c3a2

File tree

5 files changed

+75
-7
lines changed

5 files changed

+75
-7
lines changed

Project.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
129129
ExplicitImports = "7d51a73a-1435-4ff3-83d9-f097790105c7"
130130
BandedMatrices = "aae01518-5342-5314-be14-df237901396f"
131131
BlockDiagonals = "0a1fb500-61f7-11e9-3c65-f5ef3456f9f0"
132+
blis_jll = "6136c539-28a5-5bf0-87cc-b183200dce32"
132133
FastAlmostBandedMatrices = "9d29842c-ecb8-4973-b1e9-a27b1157504e"
133134
FastLapackInterface = "29a986be-02c6-4525-aec4-84b980013641"
134135
FiniteDiff = "6a86dc24-6348-571c-b903-95158fe2bd41"
@@ -155,4 +156,4 @@ Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
155156
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
156157

157158
[targets]
158-
test = ["Aqua", "Test", "IterativeSolvers", "InteractiveUtils", "KrylovKit", "KrylovPreconditioners", "Pkg", "Random", "SafeTestsets", "MultiFloats", "ForwardDiff", "HYPRE", "MPI", "BlockDiagonals", "FiniteDiff", "BandedMatrices", "FastAlmostBandedMatrices", "StaticArrays", "AllocCheck", "StableRNGs", "Zygote", "RecursiveFactorization", "Sparspak", "FastLapackInterface", "SparseArrays", "ExplicitImports"]
159+
test = ["Aqua", "Test", "IterativeSolvers", "InteractiveUtils", "KrylovKit", "KrylovPreconditioners", "Pkg", "Random", "SafeTestsets", "MultiFloats", "ForwardDiff", "HYPRE", "MPI", "BlockDiagonals", "FiniteDiff", "BandedMatrices", "blis_jll", "FastAlmostBandedMatrices", "StaticArrays", "AllocCheck", "StableRNGs", "Zygote", "RecursiveFactorization", "Sparspak", "FastLapackInterface", "SparseArrays", "ExplicitImports"]

docs/src/solvers/solvers.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,12 @@ the best choices, with SVD being the slowest but most precise.
1616

1717
For efficiency, `RFLUFactorization` is the fastest for dense LU-factorizations until around
1818
150x150 matrices, though this can be dependent on the exact details of the hardware. After this
19-
point, `MKLLUFactorization` is usually faster on most hardware. Note that on Mac computers
20-
that `AppleAccelerateLUFactorization` is generally always the fastest. `LUFactorization` will
21-
use your base system BLAS which can be fast or slow depending on the hardware configuration.
22-
`SimpleLUFactorization` will be fast only on very small matrices but can cut down on compile times.
19+
point, `MKLLUFactorization` is usually faster on most hardware. `BLISLUFactorization` provides
20+
another high-performance option that combines optimized BLAS operations with stable LAPACK routines.
21+
Note that on Mac computers that `AppleAccelerateLUFactorization` is generally always the fastest.
22+
`LUFactorization` will use your base system BLAS which can be fast or slow depending on the hardware
23+
configuration. `SimpleLUFactorization` will be fast only on very small matrices but can cut down on
24+
compile times.
2325

2426
For very large dense factorizations, offloading to the GPU can be preferred. Metal.jl can be used
2527
on Mac hardware to offload, and has a cutoff point of being faster at around size 20,000 x 20,000
@@ -185,6 +187,17 @@ KrylovJL
185187
MKLLUFactorization
186188
```
187189

190+
### BLIS.jl
191+
192+
!!! note
193+
194+
Using this solver requires that the package blis_jll is available. The solver will
195+
be automatically available when blis_jll is loaded, i.e., `using blis_jll`.
196+
197+
```@docs
198+
BLISLUFactorization
199+
```
200+
188201
### AppleAccelerate.jl
189202

190203
!!! note

ext/LinearSolveBLISExt.jl

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
"""
2+
LinearSolveBLISExt
3+
4+
Extension module that provides BLIS (BLAS-like Library Instantiation Software) integration
5+
for LinearSolve.jl. This extension combines BLIS for optimized BLAS operations with
6+
reference LAPACK for LAPACK operations, providing a high-performance yet stable linear
7+
algebra backend.
8+
9+
Key features:
10+
- Uses BLIS for BLAS operations (matrix multiplication, etc.)
11+
- Uses reference LAPACK for LAPACK operations (LU factorization, solve, etc.)
12+
- Supports all standard numeric types (Float32/64, ComplexF32/64)
13+
- Follows MKL-style ccall patterns for consistency
14+
"""
115
module LinearSolveBLISExt
216

317
using Libdl
@@ -14,7 +28,12 @@ using LinearSolve: ArrayInterface, BLISLUFactorization, @get_cacheval, LinearCac
1428
const global libblis = blis_jll.blis
1529
const global liblapack = LAPACK_jll.liblapack_path
1630

17-
# Define the factorization method for BLISLUFactorization
31+
"""
32+
LinearSolve.do_factorization(alg::BLISLUFactorization, A, b, u)
33+
34+
Perform LU factorization using BLIS for the underlying BLAS operations.
35+
This method converts the matrix to a standard format and calls the BLIS-backed getrf! routine.
36+
"""
1837
function LinearSolve.do_factorization(alg::BLISLUFactorization, A, b, u)
1938
A = convert(AbstractMatrix, A)
2039
ipiv = similar(A, BlasInt, min(size(A, 1), size(A, 2)))

src/extension_algs.jl

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -440,4 +440,39 @@ to avoid allocations and automatically offloads to the GPU.
440440
"""
441441
struct MetalLUFactorization <: AbstractFactorization end
442442

443+
"""
444+
```julia
445+
BLISLUFactorization()
446+
```
447+
448+
A wrapper over BLIS (BLAS-like Library Instantiation Software) for high-performance
449+
BLAS operations combined with reference LAPACK for stability. This provides optimized
450+
linear algebra operations while maintaining numerical accuracy and broad compatibility.
451+
452+
BLIS provides highly optimized BLAS routines that can outperform reference BLAS
453+
implementations, especially for certain matrix sizes and operations. The integration
454+
uses BLIS for BLAS operations (like matrix multiplication) and falls back to reference
455+
LAPACK for LAPACK operations (like LU factorization and solve).
456+
457+
!!! note
458+
459+
Using this solver requires that the package blis_jll is available. The solver will
460+
be automatically available when blis_jll is loaded, i.e., `using blis_jll`.
461+
462+
## Performance Characteristics
463+
464+
- **Strengths**: Optimized BLAS operations, good performance on modern hardware
465+
- **Use cases**: General dense linear systems where BLAS optimization matters
466+
- **Compatibility**: Works with all numeric types (Float32/64, Complex32/64)
467+
468+
## Example
469+
470+
```julia
471+
using LinearSolve, blis_jll
472+
A = rand(100, 100)
473+
b = rand(100)
474+
prob = LinearProblem(A, b)
475+
sol = solve(prob, BLISLUFactorization())
476+
```
477+
"""
443478
struct BLISLUFactorization <: AbstractFactorization end

test/basictests.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ using IterativeSolvers, KrylovKit, MKL_jll, KrylovPreconditioners
44
using Test
55

66
# Import JLL packages for extensions
7-
using blis_jll, LAPACK_jll
7+
using blis_jll
88
import Random
99

1010
const Dual64 = ForwardDiff.Dual{Nothing, Float64, 1}

0 commit comments

Comments
 (0)