You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement BLISFlameLUFactorization with fallback to reference LAPACK
Adds BLISFlameLUFactorization based on ideas from PR SciML#660, with fallback
approach due to libflame/ILP64 compatibility limitations:
- Created LinearSolveBLISFlameExt extension module
- Uses BLIS for BLAS operations and reference LAPACK for LAPACK operations
- Provides placeholder for future true libflame integration when compatible
- Added to benchmark script for performance comparison
- Includes comprehensive tests integrated with existing test framework
Technical details:
- libflame_jll uses 32-bit integers, incompatible with Julia's ILP64 BLAS
- Extension uses same approach as BLISLUFactorization but with different naming
- Serves as foundation for future libflame integration when packages are compatible
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
This directory contains a comprehensive benchmark script for testing the performance of various LU factorization algorithms in LinearSolve.jl, including the new BLIS integration.
4
+
5
+
## Quick Start
6
+
7
+
```bash
8
+
julia --project benchmark_blis.jl
9
+
```
10
+
11
+
This will:
12
+
1. Automatically detect available implementations (BLIS, MKL, Apple Accelerate, etc.)
13
+
2. Run benchmarks on matrix sizes from 4×4 to 256×256
14
+
3. Generate a performance plot saved as `lu_factorization_benchmark.png`
15
+
4. Display results in both console output and a summary table
16
+
17
+
**Note**: The PNG plot file cannot be included in this gist due to GitHub's binary file restrictions, but it will be generated locally when you run the benchmark.
18
+
19
+
## What Gets Benchmarked
20
+
21
+
The script automatically detects and includes algorithms based on what's available, following LinearSolve.jl's detection patterns:
22
+
23
+
-**LU (OpenBLAS)**: Default BLAS-based LU factorization
24
+
-**RecursiveFactorization**: High-performance pure Julia implementation
25
+
-**BLIS**: New BLIS-based implementation (requires `blis_jll` and `LAPACK_jll`)
26
+
-**Intel MKL**: Intel's optimized library (automatically detected on x86_64/i686, excludes EPYC CPUs by default)
27
+
-**Apple Accelerate**: Apple's framework (macOS only, checks for Accelerate.framework availability)
-**Algorithms**: Add/remove algorithms in `build_algorithm_list()`
101
+
102
+
## Understanding the Results
103
+
104
+
-**GFLOPs**: Billions of floating-point operations per second (higher is better)
105
+
-**Performance scaling**: Look for algorithms that maintain high GFLOPs as matrix size increases
106
+
-**Platform differences**: Results vary significantly between systems based on hardware and BLAS libraries
107
+
108
+
## Integration with SciMLBenchmarks
109
+
110
+
This benchmark follows the same structure as the [official SciMLBenchmarks LU factorization benchmark](https://docs.sciml.ai/SciMLBenchmarksOutput/stable/LinearSolve/LUFactorization/), making it easy to compare results and contribute to the broader benchmark suite.
0 commit comments