Skip to content

Commit 76eac8a

Browse files
Add LinearSolveAutotune sublibrary for algorithm benchmarking and optimization
This PR implements the autotune_setup function as requested in the design document, providing comprehensive benchmarking of all available LU factorization algorithms with automatic optimization and preference setting. ## Features - **Comprehensive Benchmarking**: Tests all available LU algorithms (CPU + GPU) - **Intelligent Categorization**: Finds optimal algorithms for size ranges 0-128, 128-256, 256-512, 512+ - **Preferences Integration**: Automatically sets LinearSolve preferences based on results - **Hardware Detection**: Auto-detects CUDA, Metal, MKL, Apple Accelerate availability - **Visualization**: Creates performance plots using Plots.jl - **Telemetry**: Optional GitHub sharing to issue #669 for community data collection - **Configurable**: Support for large matrix sizes, custom sampling parameters ## Usage ```julia using LinearSolve include("lib/LinearSolveAutotune/src/LinearSolveAutotune.jl") using .LinearSolveAutotune # Basic autotune results = autotune_setup() # Custom configuration results = autotune_setup( large_matrices = true, samples = 10, telemetry = false, make_plot = true ) ``` ## Implementation Details - Built as a sublibrary in `/lib/LinearSolveAutotune/` - Modular design with separate files for algorithms, benchmarking, GPU detection, etc. - Uses existing LinearSolve benchmarking patterns and luflop calculations - Integrates with Preferences.jl for persistent algorithm selection - Follows SciML formatting standards ## Future Integration This sets up the foundation for the planned enhancement in `default.jl:176-193` where preferences will influence default algorithm selection, making LinearSolve.jl automatically optimize itself based on system-specific performance characteristics. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent 7fd84cf commit 76eac8a

File tree

9 files changed

+842
-0
lines changed

9 files changed

+842
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[LinearSolveAutotune]
2+
autotune_timestamp = "2025-08-03T19:50:58.753"
3+
best_algorithm_0_128 = "LUFactorization"
4+
best_algorithm_128_256 = "LUFactorization"
5+
best_algorithm_256_512 = "LUFactorization"

lib/LinearSolveAutotune/Project.toml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name = "LinearSolveAutotune"
2+
uuid = "67398393-80e8-4254-b7e4-1b9a36a3c5b6"
3+
authors = ["SciML"]
4+
version = "0.1.0"
5+
6+
[deps]
7+
LinearSolve = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"
8+
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
9+
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
10+
GitHub = "bc5e4493-9b4d-5f90-b8aa-2b2bcaad7a26"
11+
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
12+
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
13+
Preferences = "21216c6a-2e73-6563-6e65-726566657250"
14+
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
15+
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
16+
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
17+
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
18+
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
19+
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
20+
Metal = "dde4c033-4e86-420c-a63e-0dd931031962"
21+
22+
[weakdeps]
23+
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
24+
Metal = "dde4c033-4e86-420c-a63e-0dd931031962"
25+
26+
[compat]
27+
LinearSolve = "3"
28+
BenchmarkTools = "1"
29+
DataFrames = "1"
30+
GitHub = "5"
31+
Plots = "1"
32+
PrettyTables = "2"
33+
Preferences = "1"
34+
Statistics = "1"
35+
Random = "1"
36+
LinearAlgebra = "1"
37+
Printf = "1"
38+
Dates = "1"
39+
CUDA = "5"
40+
Metal = "1"
41+
julia = "1.10"
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
module LinearSolveAutotune
2+
3+
using LinearSolve
4+
using BenchmarkTools
5+
using DataFrames
6+
using PrettyTables
7+
using Preferences
8+
using Statistics
9+
using Random
10+
using LinearAlgebra
11+
using Printf
12+
using Dates
13+
14+
# Optional dependencies for telemetry and plotting
15+
using GitHub
16+
using Plots
17+
18+
export autotune_setup
19+
20+
include("algorithms.jl")
21+
include("gpu_detection.jl")
22+
include("benchmarking.jl")
23+
include("plotting.jl")
24+
include("telemetry.jl")
25+
include("preferences.jl")
26+
27+
"""
28+
autotune_setup(;
29+
large_matrices::Bool = false,
30+
telemetry::Bool = true,
31+
make_plot::Bool = true,
32+
set_preferences::Bool = true,
33+
samples::Int = 5,
34+
seconds::Float64 = 0.5)
35+
36+
Run a comprehensive benchmark of all available LU factorization methods and optionally:
37+
38+
- Create performance plots
39+
- Upload results to GitHub telemetry
40+
- Set Preferences for optimal algorithm selection
41+
- Support both CPU and GPU algorithms based on hardware detection
42+
43+
# Arguments
44+
45+
- `large_matrices::Bool = false`: Include larger matrix sizes for GPU benchmarking
46+
- `telemetry::Bool = true`: Share results to GitHub issue for community data
47+
- `make_plot::Bool = true`: Generate performance plots
48+
- `set_preferences::Bool = true`: Update LinearSolve preferences with optimal algorithms
49+
- `samples::Int = 5`: Number of benchmark samples per algorithm/size
50+
- `seconds::Float64 = 0.5`: Maximum time per benchmark
51+
52+
# Returns
53+
54+
- `DataFrame`: Detailed benchmark results with performance data
55+
- `Plot`: Performance visualization (if `make_plot=true`)
56+
57+
# Examples
58+
59+
```julia
60+
using LinearSolve
61+
using LinearSolveAutotune
62+
63+
# Basic autotune with default settings
64+
results = autotune_setup()
65+
66+
# Custom autotune for GPU systems with larger matrices
67+
results = autotune_setup(large_matrices = true, samples = 10, seconds = 1.0)
68+
69+
# Autotune without telemetry sharing
70+
results = autotune_setup(telemetry = false)
71+
```
72+
"""
73+
function autotune_setup(;
74+
large_matrices::Bool = false,
75+
telemetry::Bool = true,
76+
make_plot::Bool = true,
77+
set_preferences::Bool = true,
78+
samples::Int = 5,
79+
seconds::Float64 = 0.5)
80+
@info "Starting LinearSolve.jl autotune setup..."
81+
@info "Configuration: large_matrices=$large_matrices, telemetry=$telemetry, make_plot=$make_plot, set_preferences=$set_preferences"
82+
83+
# Get system information
84+
system_info = get_system_info()
85+
@info "System detected: $(system_info["os"]) $(system_info["arch"]) with $(system_info["num_cores"]) cores"
86+
87+
# Get available algorithms
88+
cpu_algs, cpu_names = get_available_algorithms()
89+
@info "Found $(length(cpu_algs)) CPU algorithms: $(join(cpu_names, ", "))"
90+
91+
# Add GPU algorithms if available
92+
gpu_algs, gpu_names = get_gpu_algorithms()
93+
if !isempty(gpu_algs)
94+
@info "Found $(length(gpu_algs)) GPU algorithms: $(join(gpu_names, ", "))"
95+
end
96+
97+
# Combine all algorithms
98+
all_algs = vcat(cpu_algs, gpu_algs)
99+
all_names = vcat(cpu_names, gpu_names)
100+
101+
if isempty(all_algs)
102+
error("No algorithms found! This shouldn't happen.")
103+
end
104+
105+
# Get benchmark sizes
106+
sizes = collect(get_benchmark_sizes(large_matrices))
107+
@info "Benchmarking $(length(sizes)) matrix sizes from $(minimum(sizes)) to $(maximum(sizes))"
108+
109+
# Run benchmarks
110+
@info "Running benchmarks (this may take several minutes)..."
111+
results_df = benchmark_algorithms(sizes, all_algs, all_names;
112+
samples = samples, seconds = seconds, large_matrices = large_matrices)
113+
114+
# Display results table
115+
successful_results = filter(row -> row.success, results_df)
116+
if nrow(successful_results) > 0
117+
@info "Benchmark completed successfully!"
118+
119+
# Create summary table for display
120+
summary = combine(groupby(successful_results, :algorithm),
121+
:gflops => mean => :avg_gflops,
122+
:gflops => maximum => :max_gflops,
123+
nrow => :num_tests)
124+
sort!(summary, :avg_gflops, rev = true)
125+
126+
println("\n" * "="^60)
127+
println("BENCHMARK RESULTS SUMMARY")
128+
println("="^60)
129+
pretty_table(summary,
130+
header = ["Algorithm", "Avg GFLOPs", "Max GFLOPs", "Tests"],
131+
formatters = ft_printf("%.2f", [2, 3]),
132+
crop = :none)
133+
else
134+
@warn "No successful benchmark results!"
135+
return results_df, nothing
136+
end
137+
138+
# Categorize results and find best algorithms per size range
139+
categories = categorize_results(results_df)
140+
141+
# Set preferences if requested
142+
if set_preferences && !isempty(categories)
143+
set_algorithm_preferences(categories)
144+
end
145+
146+
# Create plot if requested
147+
plot_obj = nothing
148+
plot_files = nothing
149+
if make_plot
150+
@info "Creating performance plots..."
151+
plot_obj = create_benchmark_plot(results_df)
152+
if plot_obj !== nothing
153+
plot_files = save_benchmark_plot(plot_obj)
154+
end
155+
end
156+
157+
# Upload telemetry if requested
158+
if telemetry && nrow(successful_results) > 0
159+
@info "Preparing telemetry data for GitHub..."
160+
markdown_content = format_results_for_github(results_df, system_info, categories)
161+
upload_to_github(markdown_content, plot_files)
162+
end
163+
164+
@info "Autotune setup completed!"
165+
166+
# Return results and plot
167+
if make_plot && plot_obj !== nothing
168+
return results_df, plot_obj
169+
else
170+
return results_df
171+
end
172+
end
173+
174+
end
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Algorithm detection and creation functions
2+
3+
"""
4+
get_available_algorithms()
5+
6+
Returns a list of available LU factorization algorithms based on the system and loaded packages.
7+
"""
8+
function get_available_algorithms()
9+
algs = []
10+
alg_names = String[]
11+
12+
# Core algorithms always available
13+
push!(algs, LUFactorization())
14+
push!(alg_names, "LUFactorization")
15+
16+
push!(algs, GenericLUFactorization())
17+
push!(alg_names, "GenericLUFactorization")
18+
19+
# MKL if available
20+
if LinearSolve.usemkl
21+
push!(algs, MKLLUFactorization())
22+
push!(alg_names, "MKLLUFactorization")
23+
end
24+
25+
# Apple Accelerate if available
26+
if LinearSolve.appleaccelerate_isavailable()
27+
push!(algs, AppleAccelerateLUFactorization())
28+
push!(alg_names, "AppleAccelerateLUFactorization")
29+
end
30+
31+
# RecursiveFactorization if loaded
32+
try
33+
if LinearSolve.userecursivefactorization(nothing)
34+
push!(algs, RFLUFactorization())
35+
push!(alg_names, "RFLUFactorization")
36+
end
37+
catch
38+
# RFLUFactorization not available
39+
end
40+
41+
# SimpleLU always available
42+
push!(algs, SimpleLUFactorization())
43+
push!(alg_names, "SimpleLUFactorization")
44+
45+
return algs, alg_names
46+
end
47+
48+
"""
49+
get_gpu_algorithms()
50+
51+
Returns GPU-specific algorithms if GPU hardware and packages are available.
52+
"""
53+
function get_gpu_algorithms()
54+
gpu_algs = []
55+
gpu_names = String[]
56+
57+
# CUDA algorithms
58+
if is_cuda_available()
59+
try
60+
push!(gpu_algs, CudaOffloadFactorization())
61+
push!(gpu_names, "CudaOffloadFactorization")
62+
catch
63+
# CUDA extension not loaded
64+
end
65+
end
66+
67+
# Metal algorithms for Apple Silicon
68+
if is_metal_available()
69+
try
70+
push!(gpu_algs, MetalLUFactorization())
71+
push!(gpu_names, "MetalLUFactorization")
72+
catch
73+
# Metal extension not loaded
74+
end
75+
end
76+
77+
return gpu_algs, gpu_names
78+
end
79+
80+
"""
81+
luflop(m, n=m; innerflop=2)
82+
83+
Calculate the number of floating point operations for LU factorization.
84+
From the existing LinearSolve benchmarks.
85+
"""
86+
function luflop(m, n = m; innerflop = 2)
87+
sum(1:min(m, n)) do k
88+
invflop = 1
89+
scaleflop = isempty((k + 1):m) ? 0 : sum((k + 1):m)
90+
updateflop = isempty((k + 1):n) ? 0 :
91+
sum((k + 1):n) do j
92+
isempty((k + 1):m) ? 0 : sum((k + 1):m) do i
93+
innerflop
94+
end
95+
end
96+
invflop + scaleflop + updateflop
97+
end
98+
end

0 commit comments

Comments
 (0)