Skip to content

Commit 6a69113

Browse files
Resolve merge conflict in MWE script - keep interactive version
- Keep enhanced version with interactive token input from REPL - Include multi-attempt authentication with GitHub API warmup - Full reproduction of package's token handling workflow - Provides comprehensive debugging of MethodError issue 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
2 parents 2ef6f28 + 3683343 commit 6a69113

File tree

13 files changed

+282
-8
lines changed

13 files changed

+282
-8
lines changed

Project.toml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "LinearSolve"
22
uuid = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"
33
authors = ["SciML"]
4-
version = "3.24.0"
4+
version = "3.25.0"
55

66
[deps]
77
ArrayInterface = "4fba245c-0d91-5ea0-9b3e-6abc04ee57a9"
@@ -33,6 +33,7 @@ BlockDiagonals = "0a1fb500-61f7-11e9-3c65-f5ef3456f9f0"
3333
blis_jll = "6136c539-28a5-5bf0-87cc-b183200dce32"
3434
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
3535
CUDSS = "45b445bb-4962-46a0-9369-b4df9d0f772e"
36+
CUSOLVERRF = "a8cc9031-bad2-4722-94f5-40deabb4245c"
3637
EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
3738
FastAlmostBandedMatrices = "9d29842c-ecb8-4973-b1e9-a27b1157504e"
3839
FastLapackInterface = "29a986be-02c6-4525-aec4-84b980013641"
@@ -54,6 +55,7 @@ LinearSolveBandedMatricesExt = "BandedMatrices"
5455
LinearSolveBlockDiagonalsExt = "BlockDiagonals"
5556
LinearSolveCUDAExt = "CUDA"
5657
LinearSolveCUDSSExt = "CUDSS"
58+
LinearSolveCUSOLVERRFExt = ["CUSOLVERRF", "SparseArrays"]
5759
LinearSolveEnzymeExt = "EnzymeCore"
5860
LinearSolveFastAlmostBandedMatricesExt = "FastAlmostBandedMatrices"
5961
LinearSolveFastLapackInterfaceExt = "FastLapackInterface"
@@ -77,6 +79,7 @@ BlockDiagonals = "0.1.42, 0.2"
7779
blis_jll = "0.9.0"
7880
CUDA = "5"
7981
CUDSS = "0.1, 0.2, 0.3, 0.4"
82+
CUSOLVERRF = "0.2.6"
8083
ChainRulesCore = "1.22"
8184
ConcreteStructs = "0.2.3"
8285
DocStringExtensions = "0.9.3"

docs/src/solvers/solvers.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,14 @@ For sparse LU-factorizations, `KLUFactorization` if there is less structure
4343
to the sparsity pattern and `UMFPACKFactorization` if there is more structure.
4444
Pardiso.jl's methods are also known to be very efficient sparse linear solvers.
4545

46+
For GPU-accelerated sparse LU-factorizations, there are two high-performance options.
47+
When using CuSparseMatrixCSR arrays with CUDSS.jl loaded, `LUFactorization()` will
48+
automatically use NVIDIA's cuDSS library. Alternatively, `CUSOLVERRFFactorization`
49+
provides access to NVIDIA's cusolverRF library. Both offer significant performance
50+
improvements for sparse systems on CUDA-capable GPUs and are particularly effective
51+
for large sparse matrices that can benefit from GPU parallelization. `CUDSS` is more
52+
for `Float32` while `CUSOLVERRFFactorization` is for `Float64`.
53+
4654
While these sparse factorizations are based on implementations in other languages,
4755
and therefore constrained to standard number types (`Float64`, `Float32` and
4856
their complex counterparts), `SparspakFactorization` is able to handle general
@@ -219,7 +227,7 @@ LinearSolve.PardisoJL
219227

220228
### CUDA.jl
221229

222-
Note that `CuArrays` are supported by `GenericFactorization` in the normal way.
230+
Note that `CuArrays` are supported by `GenericFactorization` in the "normal" way.
223231
The following are non-standard GPU factorization routines.
224232

225233
!!! note
@@ -230,6 +238,16 @@ The following are non-standard GPU factorization routines.
230238
CudaOffloadFactorization
231239
```
232240

241+
### CUSOLVERRF.jl
242+
243+
!!! note
244+
245+
Using this solver requires adding the package CUSOLVERRF.jl, i.e. `using CUSOLVERRF`
246+
247+
```@docs
248+
CUSOLVERRFFactorization
249+
```
250+
233251
### IterativeSolvers.jl
234252

235253
!!! note

docs/src/tutorials/gpu.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,30 @@ sol = LS.solve(prob, LS.LUFactorization())
121121

122122
For now, CUDSS only supports CuSparseMatrixCSR type matrices.
123123

124+
For high-performance sparse LU factorization on GPUs, you can also use CUSOLVERRF.jl:
125+
126+
```julia
127+
using CUSOLVERRF
128+
sol = LS.solve(prob, LS.CUSOLVERRFFactorization())
129+
```
130+
131+
CUSOLVERRF provides access to NVIDIA's cusolverRF library, which offers significant
132+
performance improvements for sparse LU factorization on GPUs. It supports both
133+
`:RF` (default) and `:KLU` symbolic factorization methods, and can reuse symbolic
134+
factorization for matrices with the same sparsity pattern:
135+
136+
```julia
137+
# Use KLU for symbolic factorization
138+
sol = LS.solve(prob, LS.CUSOLVERRFFactorization(symbolic = :KLU))
139+
140+
# Reuse symbolic factorization for better performance
141+
sol = LS.solve(prob, LS.CUSOLVERRFFactorization(reuse_symbolic = true))
142+
```
143+
144+
!!! note
145+
146+
CUSOLVERRF only supports `Float64` element types with `Int32` indices.
147+
124148
Note that `KrylovJL` methods also work with sparse GPU arrays:
125149

126150
```julia

ext/LinearSolveCUSOLVERRFExt.jl

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
module LinearSolveCUSOLVERRFExt
2+
3+
using LinearSolve: LinearSolve, @get_cacheval, pattern_changed, OperatorAssumptions
4+
using CUSOLVERRF: CUSOLVERRF, RFLU, CUDA
5+
using SparseArrays: SparseArrays, SparseMatrixCSC, nnz
6+
using CUSOLVERRF.CUDA.CUSPARSE: CuSparseMatrixCSR
7+
using LinearAlgebra: LinearAlgebra, Adjoint, ldiv!, lu!
8+
using SciMLBase: SciMLBase, LinearProblem, ReturnCode
9+
10+
function LinearSolve.init_cacheval(alg::LinearSolve.CUSOLVERRFFactorization,
11+
A, b, u, Pl, Pr,
12+
maxiters::Int, abstol, reltol,
13+
verbose::Bool, assumptions::OperatorAssumptions)
14+
nothing
15+
end
16+
17+
function LinearSolve.init_cacheval(alg::LinearSolve.CUSOLVERRFFactorization,
18+
A::Union{CuSparseMatrixCSR{Float64, Int32}, SparseMatrixCSC{Float64, <:Integer}},
19+
b, u, Pl, Pr,
20+
maxiters::Int, abstol, reltol,
21+
verbose::Bool, assumptions::OperatorAssumptions)
22+
# Create initial factorization with appropriate options
23+
nrhs = b isa AbstractMatrix ? size(b, 2) : 1
24+
symbolic = alg.symbolic
25+
# Convert to CuSparseMatrixCSR if needed
26+
A_gpu = A isa CuSparseMatrixCSR ? A : CuSparseMatrixCSR(A)
27+
RFLU(A_gpu; nrhs=nrhs, symbolic=symbolic)
28+
end
29+
30+
function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::LinearSolve.CUSOLVERRFFactorization; kwargs...)
31+
A = cache.A
32+
33+
# Convert to appropriate GPU format if needed
34+
if A isa SparseMatrixCSC
35+
A_gpu = CuSparseMatrixCSR(A)
36+
elseif A isa CuSparseMatrixCSR
37+
A_gpu = A
38+
else
39+
error("CUSOLVERRFFactorization only supports SparseMatrixCSC or CuSparseMatrixCSR matrices")
40+
end
41+
42+
if cache.isfresh
43+
cacheval = @get_cacheval(cache, :CUSOLVERRFFactorization)
44+
if cacheval === nothing
45+
# Create new factorization
46+
nrhs = cache.b isa AbstractMatrix ? size(cache.b, 2) : 1
47+
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic)
48+
else
49+
# Reuse symbolic factorization if pattern hasn't changed
50+
if alg.reuse_symbolic && !pattern_changed(cacheval, A_gpu)
51+
fact = cacheval
52+
lu!(fact, A_gpu)
53+
else
54+
# Create new factorization if pattern changed
55+
nrhs = cache.b isa AbstractMatrix ? size(cache.b, 2) : 1
56+
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic)
57+
end
58+
end
59+
cache.cacheval = fact
60+
cache.isfresh = false
61+
end
62+
63+
F = @get_cacheval(cache, :CUSOLVERRFFactorization)
64+
65+
# Ensure b and u are on GPU
66+
b_gpu = cache.b isa CUDA.CuArray ? cache.b : CUDA.CuArray(cache.b)
67+
u_gpu = cache.u isa CUDA.CuArray ? cache.u : CUDA.CuArray(cache.u)
68+
69+
# Solve
70+
copyto!(u_gpu, b_gpu)
71+
ldiv!(F, u_gpu)
72+
73+
# Copy back to CPU if needed
74+
if !(cache.u isa CUDA.CuArray)
75+
copyto!(cache.u, u_gpu)
76+
end
77+
78+
SciMLBase.build_linear_solution(alg, cache.u, nothing, cache; retcode = ReturnCode.Success)
79+
end
80+
81+
# Helper function for pattern checking
82+
function LinearSolve.pattern_changed(rf::RFLU, A::CuSparseMatrixCSR)
83+
# For CUSOLVERRF, we need to check if the sparsity pattern has changed
84+
# This is a simplified check - you might need a more sophisticated approach
85+
size(rf) != size(A) || nnz(rf.M) != nnz(A)
86+
end
87+
88+
89+
end

mwe_api_call.jl

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ function test_github_authentication(token::AbstractString)
5858

5959
return nothing
6060
end
61-
6261
function mwe_github_api_call()
6362
println("🧪 MWE: LinearSolveAutotune GitHub API Call")
6463
println("="^50)

src/LinearSolve.jl

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ for alg in (:LUFactorization, :FastLUFactorization, :SVDFactorization,
211211
:RFLUFactorization, :UMFPACKFactorization, :KLUFactorization, :SparspakFactorization,
212212
:DiagonalFactorization, :CholeskyFactorization, :BunchKaufmanFactorization,
213213
:CHOLMODFactorization, :LDLtFactorization, :AppleAccelerateLUFactorization,
214-
:MKLLUFactorization, :MetalLUFactorization)
214+
:MKLLUFactorization, :MetalLUFactorization, :CUSOLVERRFFactorization)
215215
@eval needs_square_A(::$(alg)) = true
216216
end
217217

@@ -240,7 +240,8 @@ export LUFactorization, SVDFactorization, QRFactorization, GenericFactorization,
240240
NormalCholeskyFactorization, NormalBunchKaufmanFactorization,
241241
UMFPACKFactorization, KLUFactorization, FastLUFactorization, FastQRFactorization,
242242
SparspakFactorization, DiagonalFactorization, CholeskyFactorization,
243-
BunchKaufmanFactorization, CHOLMODFactorization, LDLtFactorization
243+
BunchKaufmanFactorization, CHOLMODFactorization, LDLtFactorization,
244+
CUSOLVERRFFactorization
244245

245246
export LinearSolveFunction, DirectLdiv!
246247

src/appleaccelerate.jl

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ function aa_getrf!(A::AbstractMatrix{<:ComplexF64};
3434
ipiv = similar(A, Cint, min(size(A, 1), size(A, 2))),
3535
info = Ref{Cint}(),
3636
check = false)
37+
__appleaccelerate_isavailable() ||
38+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
3739
require_one_based_indexing(A)
3840
check && chkfinite(A)
3941
chkstride1(A)
@@ -54,6 +56,8 @@ function aa_getrf!(A::AbstractMatrix{<:ComplexF32};
5456
ipiv = similar(A, Cint, min(size(A, 1), size(A, 2))),
5557
info = Ref{Cint}(),
5658
check = false)
59+
__appleaccelerate_isavailable() ||
60+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
5761
require_one_based_indexing(A)
5862
check && chkfinite(A)
5963
chkstride1(A)
@@ -74,6 +78,8 @@ function aa_getrf!(A::AbstractMatrix{<:Float64};
7478
ipiv = similar(A, Cint, min(size(A, 1), size(A, 2))),
7579
info = Ref{Cint}(),
7680
check = false)
81+
__appleaccelerate_isavailable() ||
82+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
7783
require_one_based_indexing(A)
7884
check && chkfinite(A)
7985
chkstride1(A)
@@ -94,6 +100,8 @@ function aa_getrf!(A::AbstractMatrix{<:Float32};
94100
ipiv = similar(A, Cint, min(size(A, 1), size(A, 2))),
95101
info = Ref{Cint}(),
96102
check = false)
103+
__appleaccelerate_isavailable() ||
104+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
97105
require_one_based_indexing(A)
98106
check && chkfinite(A)
99107
chkstride1(A)
@@ -116,6 +124,8 @@ function aa_getrs!(trans::AbstractChar,
116124
ipiv::AbstractVector{Cint},
117125
B::AbstractVecOrMat{<:ComplexF64};
118126
info = Ref{Cint}())
127+
__appleaccelerate_isavailable() ||
128+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
119129
require_one_based_indexing(A, ipiv, B)
120130
LinearAlgebra.LAPACK.chktrans(trans)
121131
chkstride1(A, B, ipiv)
@@ -140,6 +150,8 @@ function aa_getrs!(trans::AbstractChar,
140150
ipiv::AbstractVector{Cint},
141151
B::AbstractVecOrMat{<:ComplexF32};
142152
info = Ref{Cint}())
153+
__appleaccelerate_isavailable() ||
154+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
143155
require_one_based_indexing(A, ipiv, B)
144156
LinearAlgebra.LAPACK.chktrans(trans)
145157
chkstride1(A, B, ipiv)
@@ -165,6 +177,8 @@ function aa_getrs!(trans::AbstractChar,
165177
ipiv::AbstractVector{Cint},
166178
B::AbstractVecOrMat{<:Float64};
167179
info = Ref{Cint}())
180+
__appleaccelerate_isavailable() ||
181+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
168182
require_one_based_indexing(A, ipiv, B)
169183
LinearAlgebra.LAPACK.chktrans(trans)
170184
chkstride1(A, B, ipiv)
@@ -190,6 +204,8 @@ function aa_getrs!(trans::AbstractChar,
190204
ipiv::AbstractVector{Cint},
191205
B::AbstractVecOrMat{<:Float32};
192206
info = Ref{Cint}())
207+
__appleaccelerate_isavailable() ||
208+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
193209
require_one_based_indexing(A, ipiv, B)
194210
LinearAlgebra.LAPACK.chktrans(trans)
195211
chkstride1(A, B, ipiv)
@@ -236,6 +252,8 @@ end
236252

237253
function SciMLBase.solve!(cache::LinearCache, alg::AppleAccelerateLUFactorization;
238254
kwargs...)
255+
__appleaccelerate_isavailable() ||
256+
error("Error, AppleAccelerate binary is missing but solve is being called. Report this issue")
239257
A = cache.A
240258
A = convert(AbstractMatrix, A)
241259
if cache.isfresh

src/default.jl

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,17 +71,29 @@ end
7171

7272
function defaultalg(A::Tridiagonal, b, assump::OperatorAssumptions{Bool})
7373
if assump.issq
74-
DefaultLinearSolver(DefaultAlgorithmChoice.LUFactorization)
74+
@static if VERSION>=v"1.11"
75+
DirectLdiv!()
76+
else
77+
DefaultLinearSolver(DefaultAlgorithmChoice.LUFactorization)
78+
end
7579
else
7680
DefaultLinearSolver(DefaultAlgorithmChoice.QRFactorization)
7781
end
7882
end
7983

8084
function defaultalg(A::SymTridiagonal, b, ::OperatorAssumptions{Bool})
81-
DefaultLinearSolver(DefaultAlgorithmChoice.LDLtFactorization)
85+
@static if VERSION>=v"1.11"
86+
DirectLdiv!()
87+
else
88+
DefaultLinearSolver(DefaultAlgorithmChoice.LUFactorization)
89+
end
8290
end
8391
function defaultalg(A::Bidiagonal, b, ::OperatorAssumptions{Bool})
84-
DefaultLinearSolver(DefaultAlgorithmChoice.DirectLdiv!)
92+
@static if VERSION>=v"1.11"
93+
DirectLdiv!()
94+
else
95+
DefaultLinearSolver(DefaultAlgorithmChoice.LUFactorization)
96+
end
8597
end
8698
function defaultalg(A::Factorization, b, ::OperatorAssumptions{Bool})
8799
DefaultLinearSolver(DefaultAlgorithmChoice.DirectLdiv!)

src/extension_algs.jl

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,3 +441,36 @@ to avoid allocations and automatically offloads to the GPU.
441441
struct MetalLUFactorization <: AbstractFactorization end
442442

443443
struct BLISLUFactorization <: AbstractFactorization end
444+
445+
"""
446+
`CUSOLVERRFFactorization(; symbolic = :RF, reuse_symbolic = true)`
447+
448+
A GPU-accelerated sparse LU factorization using NVIDIA's cusolverRF library.
449+
This solver is specifically designed for sparse matrices on CUDA GPUs and
450+
provides high-performance factorization and solve capabilities.
451+
452+
## Keyword Arguments
453+
454+
- `symbolic`: The symbolic factorization method to use. Options are:
455+
- `:RF` (default): Use cusolverRF's built-in symbolic analysis
456+
- `:KLU`: Use KLU for symbolic analysis
457+
- `reuse_symbolic`: Whether to reuse the symbolic factorization when the
458+
sparsity pattern doesn't change (default: `true`)
459+
460+
!!! note
461+
This solver requires CUSOLVERRF.jl to be loaded and only supports
462+
`Float64` element types with `Int32` indices.
463+
"""
464+
struct CUSOLVERRFFactorization <: AbstractSparseFactorization
465+
symbolic::Symbol
466+
reuse_symbolic::Bool
467+
468+
function CUSOLVERRFFactorization(; symbolic::Symbol = :RF, reuse_symbolic::Bool = true)
469+
ext = Base.get_extension(@__MODULE__, :LinearSolveCUSOLVERRFExt)
470+
if ext === nothing
471+
error("CUSOLVERRFFactorization requires that CUSOLVERRF.jl is loaded, i.e. `using CUSOLVERRF`")
472+
else
473+
return new{}(symbolic, reuse_symbolic)
474+
end
475+
end
476+
end

test/gpu/Project.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
BlockDiagonals = "0a1fb500-61f7-11e9-3c65-f5ef3456f9f0"
33
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
44
CUDSS = "45b445bb-4962-46a0-9369-b4df9d0f772e"
5+
CUSOLVERRF = "a8cc9031-bad2-4722-94f5-40deabb4245c"
56
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
67
LinearSolve = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"
78
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
89
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
10+
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

0 commit comments

Comments
 (0)