-
-
Notifications
You must be signed in to change notification settings - Fork 72
Fix Float16 segfault with Metal algorithms (#743) #764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ChrisRackauckas
merged 6 commits into
SciML:main
from
ChrisRackauckas-Claude:fix-float16-metal-segfault
Sep 2, 2025
Merged
Fix Float16 segfault with Metal algorithms (#743) #764
ChrisRackauckas
merged 6 commits into
SciML:main
from
ChrisRackauckas-Claude:fix-float16-metal-segfault
Sep 2, 2025
+126
−57
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add compatibility check to prevent MetalLUFactorization and MetalOffload32MixedLUFactorization from being used with Float16 element types. Metal Performance Shaders only supports Float32, and attempting to use Float16 causes a segfault in MPSMatrixDecompositionLU. The fix adds an early check in test_algorithm_compatibility() to filter out Metal algorithms for Float16 before they're attempted, allowing LinearSolveAutotune to gracefully skip them rather than crash. Fixes: SciML#743 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add comprehensive compatibility checks to prevent Float16 usage with GPU algorithms that don't support it: - CUDA algorithms: CudaOffloadLUFactorization, CudaOffloadQRFactorization, and CudaOffloadFactorization don't support Float16 as cuSOLVER factorization routines require Float32/Float64 - AMD GPU algorithms: AMDGPUOffloadLUFactorization and AMDGPUOffloadQRFactorization have limited/unclear Float16 support in rocSOLVER - Metal algorithms: Keep existing MetalLUFactorization rule but allow mixed precision MetalOffload32MixedLUFactorization as it converts inputs to Float32 Mixed precision algorithms (*32Mixed*) are allowed as they internally convert inputs to Float32, making them compatible with Float16 inputs. This prevents potential segfaults, errors, or undefined behavior when attempting to use Float16 with GPU libraries that don't support it. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…zed solvers Add compatibility checks for additional solver categories that don't support Float16: - Sparse factorization: UMFPACKFactorization and KLUFactorization from SuiteSparse don't support Float16 (currently limited to double precision with single precision in development) - PARDISO solvers: All PARDISO variants (MKL/Panua) only support single/double precision - CUSOLVERRF: Specifically requires Float64/Int32 types for sparse LU refactorization This comprehensive set of compatibility rules prevents attempting to use Float16 with: - All major GPU algorithms (CUDA, Metal, AMD) - Sparse direct solvers (UMFPACK, KLU, PARDISO, CUSOLVERRF) - BLAS-dependent dense algorithms (already covered by existing BlasFloat check) Iterative/Krylov methods are allowed as they're type-generic and only need matrix-vector products, which should work with Float16. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Corrects a critical issue with manual BLAS wrapper algorithms that Chris mentioned: BLISLUFactorization, MKLLUFactorization, and AppleAccelerateLUFactorization have explicit method signatures for only [Float32, Float64, ComplexF32, ComplexF64]. Key fixes: 1. Fixed algorithm names in compatibility rules (was "BLISFactorization", now "BLISLUFactorization") 2. Added separate check for manual BLAS wrappers that bypass Julia's BLAS interface 3. These algorithms use direct ccall() with hardcoded type signatures, so they fail with MethodError for unsupported types like Float16, not BlasFloat conversion 4. Updated to catch all non-BLAS types, not just Float16 This prevents MethodError crashes when autotune attempts to use these algorithms with unsupported numeric types, addressing the "manual BLAS wrappers" issue Chris identified in the original issue comment. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Two important fixes for OpenBLAS direct wrapper: 1. **Added to autotune algorithm detection**: OpenBLASLUFactorization was missing from get_available_algorithms() despite being a manual BLAS wrapper like MKL, BLIS, and AppleAccelerate. Now included when OpenBLAS_jll.is_available(). 2. **Added to manual BLAS wrapper compatibility rules**: OpenBLASLUFactorization has the same explicit method signatures as other manual BLAS wrappers (Float32/64, ComplexF32/64 only) and would fail with MethodError for Float16. This ensures OpenBLAS direct wrapper is: - Benchmarked alongside other manual BLAS wrappers for performance comparison - Protected from crashes when used with unsupported types like Float16 - Consistent with the treatment of other manual BLAS wrapper algorithms 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Fixes the missing OpenBLAS_jll dependency that was preventing OpenBLASLUFactorization from being properly detected and included in autotune benchmarks. Changes: - Added OpenBLAS_jll to LinearSolveAutotune/Project.toml dependencies - Added OpenBLAS_jll import to LinearSolveAutotune.jl - Set compat entry for OpenBLAS_jll = "0.3" This resolves the undefined variable error when checking OpenBLAS_jll.is_available() in get_available_algorithms(), ensuring OpenBLAS direct wrapper is properly included in autotune benchmarks alongside other manual BLAS wrappers. Fixes: SciML#764 (comment) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes segfault when using LinearSolveAutotune with Float16 element types on Apple Silicon systems with Metal.jl loaded.
Problem
The issue was that
MetalLUFactorization
andMetalOffload32MixedLUFactorization
were being tested with Float16 element types during autotune setup, but Metal Performance Shaders only supports Float32. This caused a segfault inMPSMatrixDecompositionLU.mm
with the error "Only MPSDataTypeFloat32 is supported."Solution
Added compatibility checking in
test_algorithm_compatibility()
to filter out Metal algorithms when the element type is Float16. This allows LinearSolveAutotune to gracefully skip these algorithms rather than attempting to use them and crashing.Changes
lib/LinearSolveAutotune/src/benchmarking.jl
to add Metal-specific compatibility rulesTest Plan
Related Issues
Closes #743
🤖 Generated with Claude Code