From 59677d234e7a3d9951faca6bd5eed25d56293824 Mon Sep 17 00:00:00 2001 From: ChrisRackauckas Date: Thu, 7 Aug 2025 21:36:36 -0400 Subject: [PATCH] Update autotuning documentation to reflect current interface and gh-based workflow - Updated to reflect current AutotuneResults object structure - Documented the new simplified interface (autotune_setup returns AutotuneResults) - Added comprehensive examples for the plot() and share_results() functions - Clarified the current size categories (:tiny, :small, :medium, :large, :big) - Updated GitHub authentication instructions for both gh CLI and token methods - Added detailed examples for customizing benchmarks - Included troubleshooting section for common issues - Documented the community sharing workflow via GitHub issue #669 - Clarified that preference setting is still under development - Added examples for advanced usage patterns --- docs/src/tutorials/autotune.md | 392 ++++++++++++++++++++++++--------- 1 file changed, 290 insertions(+), 102 deletions(-) diff --git a/docs/src/tutorials/autotune.md b/docs/src/tutorials/autotune.md index 8653bd57d..cbc41eb8f 100644 --- a/docs/src/tutorials/autotune.md +++ b/docs/src/tutorials/autotune.md @@ -3,10 +3,7 @@ LinearSolve.jl includes an automatic tuning system that benchmarks all available linear algebra algorithms on your specific hardware and automatically selects optimal algorithms for different problem sizes and data types. This tutorial will show you how to use the `LinearSolveAutotune` sublibrary to optimize your linear solve performance. !!! warn - - This is still in development. At this point the tuning will not result in different settings - but it will run the benchmarking and create plots of the performance of the algorithms. A - future version will use the results to set preferences for the algorithms. + The autotuning system is under active development. While benchmarking and result sharing are fully functional, automatic preference setting for algorithm selection is still being refined. ## Quick Start @@ -17,60 +14,99 @@ using LinearSolve using LinearSolveAutotune # Run autotune with default settings -results, sysinfo, plots = autotune_setup() +results = autotune_setup() + +# View the results +display(results) + +# Generate performance plots +plot(results) + +# Share results with the community (optional, requires GitHub authentication) +share_results(results) ``` This will: -- Benchmark 4 element types: `Float32`, `Float64`, `ComplexF32`, `ComplexF64` -- Test matrix sizes from small (4×4), medium (500×500), to large (10,000×10,000) -- Create performance plots for each element type -- Set preferences for optimal algorithm selection -- Share results with the community (if desired) +- Benchmark algorithms for `Float64` matrices by default +- Test matrix sizes from tiny (5×5) through large (1000×1000) +- Display a summary of algorithm performance +- Return an `AutotuneResults` object containing all benchmark data ## Understanding the Results -The autotune process returns benchmark results and creates several outputs: +The `autotune_setup()` function returns an `AutotuneResults` object containing: +- `results_df`: A DataFrame with detailed benchmark results +- `sysinfo`: System information dictionary + +You can explore the results in several ways: ```julia -# Basic usage returns just the DataFrame of results and system information -results, sysinfo, _ = autotune_setup(make_plot=false) +# Get the results +results = autotune_setup() -# With plotting enabled, returns (DataFrame, System Info, Plots) -results, sysinfo, plots = autotune_setup(make_plot=true) +# Display a formatted summary +display(results) -# Examine the results -println("Algorithms tested: ", unique(results.algorithm)) -println("Element types: ", unique(results.eltype)) -println("Size range: ", minimum(results.size), " to ", maximum(results.size)) +# Access the raw benchmark data +df = results.results_df + +# View system information +sysinfo = results.sysinfo + +# Generate performance plots +plot(results) + +# Filter to see successful benchmarks only +using DataFrames +successful = filter(row -> row.success, df) ``` ## Customizing the Autotune Process -### Element Types +### Size Categories -You can specify which element types to benchmark: +Control which matrix size ranges to test: ```julia -# Test only Float64 and ComplexF64 -results, sysinfo, _ = autotune_setup(eltypes = (Float64, ComplexF64)) +# Available size categories: +# :tiny - 5×5 to 20×20 (very small problems) +# :small - 20×20 to 100×100 (small problems) +# :medium - 100×100 to 300×300 (typical problems) +# :large - 300×300 to 1000×1000 (larger problems) +# :big - 10000×10000 to 100000×100000 (GPU/HPC scale) -# Test arbitrary precision types (excludes BLAS algorithms) -results, sysinfo, _ = autotune_setup(eltypes = (BigFloat,), telemetry = false) +# Default: test tiny through large +results = autotune_setup() # uses [:tiny, :small, :medium, :large] -# Test high precision float -results, sysinfo, _ = autotune_setup(eltypes = (Float64, BigFloat)) +# Test only medium and large sizes +results = autotune_setup(sizes = [:medium, :large]) + +# Include huge matrices (for GPU systems) +results = autotune_setup(sizes = [:large, :big]) + +# Test all size categories +results = autotune_setup(sizes = [:tiny, :small, :medium, :large, :big]) ``` -### Matrix Sizes +### Element Types -Control the range of matrix sizes tested: +Specify which numeric types to benchmark: ```julia -# Default: small to medium matrices (4×4 to 500×500) -results, sysinfo, _ = autotune_setup(large_matrices = false) +# Default: Float64 only +results = autotune_setup() # equivalent to eltypes = (Float64,) + +# Test standard floating point types +results = autotune_setup(eltypes = (Float32, Float64)) + +# Include complex numbers +results = autotune_setup(eltypes = (Float64, ComplexF64)) -# Large matrices: includes sizes up to 10,000×10,000 (good for GPU systems) -results, sysinfo, _ = autotune_setup(large_matrices = true) +# Test all standard BLAS types +results = autotune_setup(eltypes = (Float32, Float64, ComplexF32, ComplexF64)) + +# Test arbitrary precision (excludes some BLAS algorithms) +results = autotune_setup(eltypes = (BigFloat,), skip_missing_algs = true) ``` ### Benchmark Quality vs Speed @@ -79,56 +115,45 @@ Adjust the thoroughness of benchmarking: ```julia # Quick benchmark (fewer samples, less time per test) -results, sysinfo, _ = autotune_setup(samples = 1, seconds = 0.1) - -# Thorough benchmark (more samples, more time per test) -results, sysinfo, _ = autotune_setup(samples = 10, seconds = 2.0) -``` - -### Privacy and Telemetry - -!!! warn +results = autotune_setup(samples = 1, seconds = 0.1) - Telemetry implementation is still in development. +# Default benchmark (balanced) +results = autotune_setup(samples = 5, seconds = 0.5) -The telemetry featrure of LinearSolveAutotune allows sharing performance results -with the community to improve algorithm selection. Minimal data is collected, including: +# Thorough benchmark (more samples, more time per test) +results = autotune_setup(samples = 10, seconds = 2.0) -- System information (OS, CPU, core count) -- Algorithm performance results +# Production-quality benchmark for final tuning +results = autotune_setup( + samples = 20, + seconds = 5.0, + sizes = [:small, :medium, :large], + eltypes = (Float32, Float64, ComplexF32, ComplexF64) +) +``` -and shared via public GitHub. This helps the community understand performance across -different hardware configurations and further improve the default algorithm selection -and research in improved algorithms. +### Missing Algorithm Handling -However, if your system has privacy concerns or you prefer not to share data, you can disable telemetry: +By default, autotune expects all algorithms to be available to ensure complete benchmarking. You can relax this requirement: ```julia -# Disable telemetry (no data shared) -results, sysinfo, _ = autotune_setup(telemetry = false) - -# Disable preference setting (just benchmark, don't change defaults) -results, sysinfo, _ = autotune_setup(set_preferences = false) +# Default: error if expected algorithms are missing +results = autotune_setup() # Will error if RFLUFactorization is missing -# Disable plotting (faster, less output) -results, sysinfo, _ = autotune_setup(make_plot = false) +# Allow missing algorithms (useful for incomplete setups) +results = autotune_setup(skip_missing_algs = true) # Will warn instead of error ``` -### Missing Algorithm Handling +### Preferences Setting -By default, autotune is assertive about finding all expected algorithms. This is because -we want to ensure that all possible algorithms on a given hardware are tested in order for -the autotuning histroy/telemetry to be as complete as possible. However, in some cases -you may want to allow missing algorithms, such as when running on a system where the -hardware may not have support due to driver versions or other issues. If that's the case, -you can set `skip_missing_algs = true` to allow missing algorithms without failing the autotune setup: +Control whether the autotuner updates LinearSolve preferences: ```julia -# Default behavior: error if expected algorithms are missing -results, sysinfo, _ = autotune_setup() # Will error if RFLUFactorization missing +# Default: set preferences based on benchmark results +results = autotune_setup(set_preferences = true) -# Allow missing algorithms (useful for incomplete setups) -results, sysinfo, _ = autotune_setup(skip_missing_algs = true) # Will warn instead of error +# Benchmark only, don't change preferences +results = autotune_setup(set_preferences = false) ``` ## GPU Systems @@ -137,13 +162,79 @@ On systems with CUDA or Metal GPU support, the autotuner will automatically dete ```julia # Enable large matrix testing for GPUs -results, sysinfo, _ = autotune_setup(large_matrices = true, samples = 3, seconds = 1.0) +results = autotune_setup( + sizes = [:large, :big], + samples = 3, + seconds = 1.0 +) ``` GPU algorithms tested (when available): - **CudaOffloadFactorization**: CUDA GPU acceleration - **MetalLUFactorization**: Apple Metal GPU acceleration +## Sharing Results with the Community + +The autotuner includes a telemetry feature that allows you to share your benchmark results with the LinearSolve.jl community. This helps improve algorithm selection across different hardware configurations. + +### Setting Up GitHub Authentication + +To share results, you need to authenticate with GitHub. There are two methods: + +#### Method 1: GitHub CLI (Recommended) + +1. **Install GitHub CLI** + - macOS: `brew install gh` + - Windows: `winget install --id GitHub.cli` + - Linux: See [cli.github.com](https://cli.github.com/manual/installation) + +2. **Authenticate** + ```bash + gh auth login + ``` + Follow the prompts to authenticate with your GitHub account. + +3. **Verify authentication** + ```bash + gh auth status + ``` + +#### Method 2: GitHub Personal Access Token + +1. Go to [GitHub Settings > Tokens](https://github.com/settings/tokens/new) +2. Add description: "LinearSolve.jl Telemetry" +3. Select scope: `public_repo` (for commenting on issues) +4. Click "Generate token" and copy it +5. In Julia: + ```julia + ENV["GITHUB_TOKEN"] = "your_token_here" + ``` + +### Sharing Your Results + +Once authenticated, sharing is simple: + +```julia +# Run benchmarks +results = autotune_setup() + +# Share with the community +share_results(results) +``` + +This will: +1. Format your benchmark results as a markdown report +2. Create performance plots if enabled +3. Post the results as a comment to the [community benchmark collection issue](https://github.com/SciML/LinearSolve.jl/issues/669) +4. Upload plots as GitHub Gists for easy viewing + +!!! info "Privacy Note" + - Sharing is completely optional + - Only benchmark performance data and system specifications are shared + - No personal information is collected + - All shared data is publicly visible on GitHub + - If authentication fails, results are saved locally for manual sharing + ## Working with Results ### Examining Performance Data @@ -152,10 +243,13 @@ GPU algorithms tested (when available): using DataFrames using Statistics -results, sysinfo, _ = autotune_setup(make_plot = false) +results = autotune_setup() + +# Access the raw DataFrame +df = results.results_df # Filter successful results -successful = filter(row -> row.success, results) +successful = filter(row -> row.success, df) # Summary by algorithm summary = combine(groupby(successful, [:algorithm, :eltype]), @@ -163,37 +257,115 @@ summary = combine(groupby(successful, [:algorithm, :eltype]), :gflops => maximum => :max_gflops) sort!(summary, :avg_gflops, rev=true) println(summary) + +# Best algorithm for each size category +by_size = combine(groupby(successful, [:size_category, :eltype])) do group + best_row = argmax(group.gflops) + return (algorithm = group.algorithm[best_row], + gflops = group.gflops[best_row]) +end +println(by_size) ``` -### Performance Plots +### Performance Visualization -When `make_plot=true`, you get separate plots for each element type: +Generate and save performance plots: ```julia -results, sysinfo, plots = autotune_setup() +results = autotune_setup() + +# Generate plots (returns a combined plot) +p = plot(results) +display(p) + +# Save the plot +using Plots +savefig(p, "benchmark_results.png") +``` + +### Accessing System Information + +```julia +results = autotune_setup() + +# System information is stored in the results +sysinfo = results.sysinfo +println("CPU: ", sysinfo["cpu_name"]) +println("Cores: ", sysinfo["num_cores"]) +println("Julia: ", sysinfo["julia_version"]) +println("OS: ", sysinfo["os"]) +``` + +## Advanced Usage + +### Custom Benchmark Pipeline + +For complete control over the benchmarking process: -# plots is a dictionary keyed by element type -for (eltype, plot) in plots - println("Plot for $eltype available") - # Plots are automatically saved as PNG and PDF files - display(plot) +```julia +# Step 1: Run benchmarks without plotting or sharing +results = autotune_setup( + sizes = [:medium, :large], + eltypes = (Float64, ComplexF64), + set_preferences = false, # Don't change preferences yet + samples = 10, + seconds = 1.0 +) + +# Step 2: Analyze results +df = results.results_df +# ... perform custom analysis ... + +# Step 3: Generate plots +p = plot(results) +savefig(p, "my_benchmarks.png") + +# Step 4: Optionally share results +share_results(results) +``` + +### Batch Testing Multiple Configurations + +```julia +# Test different element types separately +configs = [ + (eltypes = (Float32,), name = "float32"), + (eltypes = (Float64,), name = "float64"), + (eltypes = (ComplexF64,), name = "complex64") +] + +all_results = Dict() +for config in configs + println("Testing $(config.name)...") + results = autotune_setup( + eltypes = config.eltypes, + sizes = [:small, :medium], + samples = 3 + ) + all_results[config.name] = results end ``` -### Preferences Integration +## Preferences Integration -The autotuner sets preferences that LinearSolve.jl uses for automatic algorithm selection: +!!! warn + Automatic preference setting is still under development and may not affect algorithm selection in the current version. + +The autotuner can set preferences that LinearSolve.jl will use for automatic algorithm selection: ```julia using LinearSolveAutotune -# View current preferences +# View current preferences (if any) LinearSolveAutotune.show_current_preferences() +# Run autotune and set preferences +results = autotune_setup(set_preferences = true) + # Clear all autotune preferences LinearSolveAutotune.clear_algorithm_preferences() -# Set custom preferences +# Manually set custom preferences custom_categories = Dict( "Float64_0-128" => "RFLUFactorization", "Float64_128-256" => "LUFactorization" @@ -201,26 +373,42 @@ custom_categories = Dict( LinearSolveAutotune.set_algorithm_preferences(custom_categories) ``` -## How Preferences Affect LinearSolve.jl +## Troubleshooting -!!! warn +### Common Issues - Usage of autotune preferences is still in development. +1. **Missing algorithms error** + ```julia + # If you get errors about missing algorithms: + results = autotune_setup(skip_missing_algs = true) + ``` -After running autotune, LinearSolve.jl will automatically use the optimal algorithms: +2. **GitHub authentication fails** + - Ensure gh CLI is installed and authenticated: `gh auth status` + - Or set a valid GitHub token: `ENV["GITHUB_TOKEN"] = "your_token"` + - Results will be saved locally if authentication fails -```julia -using LinearSolve +3. **Out of memory on large matrices** + ```julia + # Use smaller size categories + results = autotune_setup(sizes = [:tiny, :small, :medium]) + ``` + +4. **Benchmarks taking too long** + ```julia + # Reduce samples and time per benchmark + results = autotune_setup(samples = 1, seconds = 0.1) + ``` + +## Summary + +LinearSolveAutotune provides a comprehensive system for benchmarking and optimizing LinearSolve.jl performance on your specific hardware. Key features include: + +- Flexible size categories from tiny to GPU-scale matrices +- Support for all standard numeric types +- Automatic GPU algorithm detection +- Community result sharing via GitHub +- Performance visualization +- Preference setting for automatic algorithm selection (in development) -# This will now use the algorithm determined by autotune -A = rand(100, 100) # Float64 matrix in 0-128 size range -b = rand(100) -prob = LinearProblem(A, b) -sol = solve(prob) # Uses auto-selected optimal algorithm - -# For different sizes, different optimal algorithms may be used -A_large = rand(300, 300) # Different size range -b_large = rand(300) -prob_large = LinearProblem(A_large, b_large) -sol_large = solve(prob_large) # May use different algorithm -``` \ No newline at end of file +By running autotune and optionally sharing your results, you help improve LinearSolve.jl's performance for everyone in the Julia community. \ No newline at end of file