Skip to content

Commit 676c27f

Browse files
committed
Add comprehensive documentation for OpenBLASLUFactorization
- Add detailed docstring with performance characteristics and usage examples - Include OpenBLASLUFactorization in solver documentation alongside MKL and AppleAccelerate - Update algorithm selection guide to mention OpenBLAS as an option for large dense matrices - Document when to use OpenBLAS vs other BLAS implementations
1 parent 4b7ef0a commit 676c27f

File tree

3 files changed

+41
-7
lines changed

3 files changed

+41
-7
lines changed

docs/src/basics/algorithm_selection.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,10 @@ sol = solve(LinearProblem(A_small, rand(50)), SimpleLUFactorization())
7171
A_medium = rand(200, 200)
7272
sol = solve(LinearProblem(A_medium, rand(200)), RFLUFactorization())
7373

74-
# Large matrices (> 500×500): MKLLUFactorization or AppleAccelerate
74+
# Large matrices (> 500×500): MKLLUFactorization, OpenBLASLUFactorization, or AppleAccelerate
7575
A_large = rand(1000, 1000)
7676
sol = solve(LinearProblem(A_large, rand(1000)), MKLLUFactorization())
77+
# Alternative: OpenBLASLUFactorization() for direct OpenBLAS calls
7778
```
7879

7980
### Sparse Matrices
@@ -141,7 +142,7 @@ Is A symmetric positive definite? → CholeskyFactorization
141142
Is A symmetric indefinite? → BunchKaufmanFactorization
142143
Is A sparse? → UMFPACKFactorization or KLUFactorization
143144
Is A small dense? → RFLUFactorization or SimpleLUFactorization
144-
Is A large dense? → MKLLUFactorization or AppleAccelerateLUFactorization
145+
Is A large dense? → MKLLUFactorization, OpenBLASLUFactorization, or AppleAccelerateLUFactorization
145146
Is A GPU array? → QRFactorization or LUFactorization
146147
Is A an operator/function? → KrylovJL_GMRES
147148
Is the system overdetermined? → QRFactorization or KrylovJL_LSMR

docs/src/solvers/solvers.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,11 @@ the best choices, with SVD being the slowest but most precise.
1717
For efficiency, `RFLUFactorization` is the fastest for dense LU-factorizations until around
1818
150x150 matrices, though this can be dependent on the exact details of the hardware. After this
1919
point, `MKLLUFactorization` is usually faster on most hardware. Note that on Mac computers
20-
that `AppleAccelerateLUFactorization` is generally always the fastest. `LUFactorization` will
21-
use your base system BLAS which can be fast or slow depending on the hardware configuration.
22-
`SimpleLUFactorization` will be fast only on very small matrices but can cut down on compile times.
20+
that `AppleAccelerateLUFactorization` is generally always the fastest. `OpenBLASLUFactorization`
21+
provides direct OpenBLAS calls without going through libblastrampoline and can be faster than
22+
`LUFactorization` in some configurations. `LUFactorization` will use your base system BLAS which
23+
can be fast or slow depending on the hardware configuration. `SimpleLUFactorization` will be fast
24+
only on very small matrices but can cut down on compile times.
2325

2426
For very large dense factorizations, offloading to the GPU can be preferred. Metal.jl can be used
2527
on Mac hardware to offload, and has a cutoff point of being faster at around size 20,000 x 20,000
@@ -207,6 +209,12 @@ KrylovJL
207209
MKLLUFactorization
208210
```
209211

212+
### OpenBLAS
213+
214+
```@docs
215+
OpenBLASLUFactorization
216+
```
217+
210218
### AppleAccelerate.jl
211219

212220
!!! note

src/openblas.jl

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,33 @@
33
OpenBLASLUFactorization()
44
```
55
6-
A wrapper over OpenBLAS. Direct calls to OpenBLAS in a way that pre-allocates workspace
7-
to avoid allocations and does not require libblastrampoline.
6+
A direct wrapper over OpenBLAS's LU factorization (`getrf!` and `getrs!`).
7+
This solver makes direct calls to OpenBLAS_jll without going through Julia's
8+
libblastrampoline, which can provide performance benefits in certain configurations.
9+
10+
## Performance Characteristics
11+
12+
- Pre-allocates workspace to avoid allocations during solving
13+
- Makes direct `ccall`s to OpenBLAS routines
14+
- Can be faster than `LUFactorization` when OpenBLAS is well-optimized for the hardware
15+
- Supports `Float32`, `Float64`, `ComplexF32`, and `ComplexF64` element types
16+
17+
## When to Use
18+
19+
- When you want to ensure OpenBLAS is used regardless of the system BLAS configuration
20+
- When benchmarking shows better performance than `LUFactorization` on your specific hardware
21+
- When you need consistent behavior across different systems (always uses OpenBLAS)
22+
23+
## Example
24+
25+
```julia
26+
using LinearSolve, LinearAlgebra
27+
28+
A = rand(100, 100)
29+
b = rand(100)
30+
prob = LinearProblem(A, b)
31+
sol = solve(prob, OpenBLASLUFactorization())
32+
```
833
"""
934
struct OpenBLASLUFactorization <: AbstractFactorization end
1035

0 commit comments

Comments
 (0)