Skip to content

Commit e775de6

Browse files
Update documentation for CudaOffload factorization changes
- Updated GPU tutorial to show new CudaOffloadLUFactorization/QRFactorization - Updated solver documentation to explain both algorithms - Added deprecation warning in documentation - Updated release notes with upcoming changes - Created example demonstrating usage of both new algorithms - Explained when to use each algorithm (LU for performance, QR for stability)
1 parent 57fee72 commit e775de6

File tree

4 files changed

+121
-6
lines changed

4 files changed

+121
-6
lines changed

docs/src/release_notes.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Release Notes
22

3+
## Upcoming Changes
4+
5+
- `CudaOffloadFactorization` has been split into two algorithms:
6+
- `CudaOffloadLUFactorization` - Uses LU factorization for better performance
7+
- `CudaOffloadQRFactorization` - Uses QR factorization for better numerical stability
8+
- `CudaOffloadFactorization` is now deprecated and will show a warning suggesting to use one of the new algorithms
9+
310
## v2.0
411

512
- `LinearCache` changed from immutable to mutable. With this, the out of place interfaces like

docs/src/solvers/solvers.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,14 @@ use your base system BLAS which can be fast or slow depending on the hardware co
2323

2424
For very large dense factorizations, offloading to the GPU can be preferred. Metal.jl can be used
2525
on Mac hardware to offload, and has a cutoff point of being faster at around size 20,000 x 20,000
26-
matrices (and only supports Float32). `CudaOffloadFactorization` can be more efficient at a
27-
much smaller cutoff, possibly around size 1,000 x 1,000 matrices, though this is highly dependent
28-
on the chosen GPU hardware. `CudaOffloadFactorization` requires a CUDA-compatible NVIDIA GPU.
26+
matrices (and only supports Float32). `CudaOffloadLUFactorization` and `CudaOffloadQRFactorization`
27+
can be more efficient at a much smaller cutoff, possibly around size 1,000 x 1,000 matrices, though
28+
this is highly dependent on the chosen GPU hardware. These algorithms require a CUDA-compatible NVIDIA GPU.
2929
CUDA offload supports Float64 but most consumer GPU hardware will be much faster on Float32
3030
(many are >32x faster for Float32 operations than Float64 operations) and thus for most hardware
31-
this is only recommended for Float32 matrices.
31+
this is only recommended for Float32 matrices. Choose `CudaOffloadLUFactorization` for better
32+
performance on well-conditioned problems, or `CudaOffloadQRFactorization` for better numerical
33+
stability on ill-conditioned problems.
3234

3335
!!! note
3436

@@ -232,9 +234,11 @@ The following are non-standard GPU factorization routines.
232234

233235
!!! note
234236

235-
Using this solver requires adding the package CUDA.jl, i.e. `using CUDA`
237+
Using these solvers requires adding the package CUDA.jl, i.e. `using CUDA`
236238

237239
```@docs
240+
CudaOffloadLUFactorization
241+
CudaOffloadQRFactorization
238242
CudaOffloadFactorization
239243
```
240244

docs/src/tutorials/gpu.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,17 @@ This computation can be moved to the GPU by the following:
4040

4141
```julia
4242
using CUDA # Add the GPU library
43-
sol = LS.solve(prob, LS.CudaOffloadFactorization())
43+
sol = LS.solve(prob, LS.CudaOffloadLUFactorization())
4444
sol.u
4545
```
4646

47+
LinearSolve.jl provides two GPU offloading algorithms:
48+
- `CudaOffloadLUFactorization()` - Uses LU factorization (generally faster for well-conditioned problems)
49+
- `CudaOffloadQRFactorization()` - Uses QR factorization (more stable for ill-conditioned problems)
50+
51+
!!! warning
52+
The old `CudaOffloadFactorization()` is deprecated. Use `CudaOffloadLUFactorization()` or `CudaOffloadQRFactorization()` instead.
53+
4754
## GPUArray Interface
4855

4956
For more manual control over the factorization setup, you can use the

examples/cuda_offload_example.jl

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
"""
2+
Example demonstrating the new CudaOffloadLUFactorization and CudaOffloadQRFactorization algorithms.
3+
4+
This example shows how to use the new GPU offloading algorithms for solving linear systems
5+
with different numerical properties.
6+
"""
7+
8+
using LinearSolve
9+
using LinearAlgebra
10+
using Random
11+
12+
# Set random seed for reproducibility
13+
Random.seed!(123)
14+
15+
println("CUDA Offload Factorization Examples")
16+
println("=" ^ 40)
17+
18+
# Create a well-conditioned problem
19+
println("\n1. Well-conditioned problem (condition number ≈ 10)")
20+
A_good = rand(100, 100)
21+
A_good = A_good + 10I # Make it well-conditioned
22+
b_good = rand(100)
23+
prob_good = LinearProblem(A_good, b_good)
24+
25+
println(" Matrix size: $(size(A_good))")
26+
println(" Condition number: $(cond(A_good))")
27+
28+
# Try to use CUDA if available
29+
try
30+
using CUDA
31+
32+
# Solve with LU (faster for well-conditioned)
33+
println("\n Solving with CudaOffloadLUFactorization...")
34+
sol_lu = solve(prob_good, CudaOffloadLUFactorization())
35+
println(" Solution norm: $(norm(sol_lu.u))")
36+
println(" Residual norm: $(norm(A_good * sol_lu.u - b_good))")
37+
38+
# Solve with QR (more stable)
39+
println("\n Solving with CudaOffloadQRFactorization...")
40+
sol_qr = solve(prob_good, CudaOffloadQRFactorization())
41+
println(" Solution norm: $(norm(sol_qr.u))")
42+
println(" Residual norm: $(norm(A_good * sol_qr.u - b_good))")
43+
44+
catch e
45+
println("\n Note: CUDA.jl is not loaded. To use GPU offloading:")
46+
println(" 1. Install CUDA.jl: using Pkg; Pkg.add(\"CUDA\")")
47+
println(" 2. Add 'using CUDA' before running this example")
48+
println(" 3. Ensure you have a CUDA-compatible NVIDIA GPU")
49+
end
50+
51+
# Create an ill-conditioned problem
52+
println("\n2. Ill-conditioned problem (condition number ≈ 1e6)")
53+
A_bad = rand(50, 50)
54+
# Make it ill-conditioned
55+
U, S, V = svd(A_bad)
56+
S[end] = S[1] / 1e6 # Create large condition number
57+
A_bad = U * Diagonal(S) * V'
58+
b_bad = rand(50)
59+
prob_bad = LinearProblem(A_bad, b_bad)
60+
61+
println(" Matrix size: $(size(A_bad))")
62+
println(" Condition number: $(cond(A_bad))")
63+
64+
try
65+
using CUDA
66+
67+
# For ill-conditioned problems, QR is typically more stable
68+
println("\n Solving with CudaOffloadQRFactorization (recommended for ill-conditioned)...")
69+
sol_qr_bad = solve(prob_bad, CudaOffloadQRFactorization())
70+
println(" Solution norm: $(norm(sol_qr_bad.u))")
71+
println(" Residual norm: $(norm(A_bad * sol_qr_bad.u - b_bad))")
72+
73+
println("\n Solving with CudaOffloadLUFactorization (may be less stable)...")
74+
sol_lu_bad = solve(prob_bad, CudaOffloadLUFactorization())
75+
println(" Solution norm: $(norm(sol_lu_bad.u))")
76+
println(" Residual norm: $(norm(A_bad * sol_lu_bad.u - b_bad))")
77+
78+
catch e
79+
println("\n Skipping GPU tests (CUDA not available)")
80+
end
81+
82+
# Demonstrate the deprecation warning
83+
println("\n3. Testing deprecated CudaOffloadFactorization")
84+
try
85+
using CUDA
86+
println(" Creating deprecated CudaOffloadFactorization...")
87+
alg = CudaOffloadFactorization() # This will show a deprecation warning
88+
println(" The deprecated algorithm still works but shows a warning above")
89+
catch e
90+
println(" Skipping deprecation test (CUDA not available)")
91+
end
92+
93+
println("\n" * "=" ^ 40)
94+
println("Summary:")
95+
println("- Use CudaOffloadLUFactorization for well-conditioned problems (faster)")
96+
println("- Use CudaOffloadQRFactorization for ill-conditioned problems (more stable)")
97+
println("- The old CudaOffloadFactorization is deprecated")

0 commit comments

Comments
 (0)