Complete native binary triangular solve for MKL and BLIS #667

ChrisRackauckas-Claude · 2025-08-03T22:36:45Z

Summary

Complete the triangular solve portion for directly wrapped binaries (MKL and BLIS) to use native LAPACK calls instead of falling back to libblastrampoline. AppleAccelerate was already correctly implemented.

Problem

Previously, the MKL and BLIS LU factorization wrappers were incomplete:

✅ Factorization phase (getrf\!): Used native binary calls
❌ Triangular solve phase (ldiv\!): Fell back to Julia's ldiv\! → libblastrampoline

This meant that despite having direct native binary access, the solve step still went through the generic Julia Linear Algebra stack.

Solution

MKL (`src/mkl.jl`)

Replace ldiv\!(cache.u, factorization, cache.b) with direct getrs\! calls
Use existing native MKL getrs\! functions that were already implemented but unused
Handle both square and overdetermined systems correctly

BLIS (`ext/LinearSolveBLISExt.jl`)

Replace ldiv\!(cache.u, factorization, cache.b) with direct getrs\! calls
Use existing native LAPACK getrs\! functions via BLIS that were already implemented but unused
Add proper error handling with ReturnCode import
Handle both square and overdetermined systems correctly

AppleAccelerate (`src/appleaccelerate.jl`)

✅ No changes needed - already correctly implemented with native aa_getrs\! calls

Key Changes

MKL solve method:

# Before
y = ldiv\!(cache.u, @get_cacheval(cache, :MKLLUFactorization)[1], cache.b)

# After  
A, info = @get_cacheval(cache, :MKLLUFactorization)
# ... dimension handling ...
getrs\!('N', A.factors, A.ipiv, cache.u; info)

BLIS solve method:

# Before
y = ldiv\!(cache.u, @get_cacheval(cache, :BLISLUFactorization)[1], cache.b)

# After
A, info = @get_cacheval(cache, :BLISLUFactorization)  
# ... dimension handling ...
getrs\!('N', A.factors, A.ipiv, cache.u; info)

Benefits

Performance: Eliminates libblastrampoline overhead for complete solve process
Consistency: All three native binaries now use their own LAPACK throughout
Correctness: Proper handling of both square and overdetermined systems
Maintainability: Uses existing well-tested getrs\! implementations

Test Results

✅ Code compiles successfully without errors
✅ No syntax issues detected
✅ Standard LU functionality remains intact
✅ Native binary loading works correctly when dependencies available

Checklist

MKL triangular solve uses native MKL getrs\! calls
BLIS triangular solve uses native LAPACK getrs\! calls via BLIS
AppleAccelerate confirmed already correct with native aa_getrs\! calls
Proper error handling for failed factorizations
Both square and overdetermined system support
Backward compatibility maintained
All implementations compile without errors

🤖 Generated with Claude Code

Replace Julia ldiv\! fallback with direct MKL getrs\! calls for the triangular solve portion of MKLLUFactorization. This ensures the entire LU solve process uses native MKL LAPACK routines instead of falling back to libblastrampoline. Changes: - Use existing getrs\! functions that were already implemented but unused - Handle both square and overdetermined systems with proper dimension checks - Add proper error handling for failed factorizations - Maintain compatibility with existing LinearCache interface 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Replace Julia ldiv\! fallback with direct LAPACK getrs\! calls via BLIS for the triangular solve portion of BLISLUFactorization. This ensures the entire LU solve process uses native LAPACK routines through BLIS instead of falling back to libblastrampoline. Changes: - Use existing getrs\! functions that were already implemented but unused - Handle both square and overdetermined systems with proper dimension checks - Add proper error handling for failed factorizations with ReturnCode - Add missing ReturnCode import from SciMLBase - Maintain compatibility with existing LinearCache interface 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

ChrisRackauckas and others added 3 commits August 3, 2025 18:35

Update Tests.yml

91549ff

ChrisRackauckas merged commit 7fd84cf into SciML:main Aug 3, 2025
89 of 100 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Complete native binary triangular solve for MKL and BLIS #667

Complete native binary triangular solve for MKL and BLIS #667

Uh oh!

ChrisRackauckas-Claude commented Aug 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Complete native binary triangular solve for MKL and BLIS #667

Complete native binary triangular solve for MKL and BLIS #667

Uh oh!

Conversation

ChrisRackauckas-Claude commented Aug 3, 2025

Summary

Problem

Solution

MKL (src/mkl.jl)

BLIS (ext/LinearSolveBLISExt.jl)

AppleAccelerate (src/appleaccelerate.jl)

Key Changes

Benefits

Test Results

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MKL (`src/mkl.jl`)

BLIS (`ext/LinearSolveBLISExt.jl`)

AppleAccelerate (`src/appleaccelerate.jl`)