Add mixed precision LU factorization methods #746

ChrisRackauckas-Claude · 2025-08-20T13:38:23Z

Summary

This PR introduces mixed precision LU factorization methods that perform computations in Float32 while maintaining Float64 interfaces, providing significant performance improvements for memory-bandwidth limited problems.

New Factorization Methods

CUDAOffload32MixedLUFactorization: GPU-accelerated mixed precision for NVIDIA GPUs
MetalOffload32MixedLUFactorization: GPU-accelerated mixed precision for Apple Metal
MKL32MixedLUFactorization: CPU-based mixed precision using Intel MKL
AppleAccelerate32MixedLUFactorization: CPU-based mixed precision using Apple Accelerate

Key Features

Transparent precision conversion: Automatically converts Float64/ComplexF64 to Float32/ComplexF32 for factorization
Performance benefits: Up to 2x speedup for large, well-conditioned matrices
Hardware acceleration: Leverages GPU offloading and optimized CPU libraries
Complex number support: Handles both real and complex matrices

Usage Example

using LinearSolve

A = rand(1000, 1000) + 5.0I  # Well-conditioned matrix
b = rand(1000)
prob = LinearProblem(A, b)

# Solve with mixed precision
sol = solve(prob, MKL32MixedLUFactorization())  # Intel CPUs
sol = solve(prob, CUDAOffload32MixedLUFactorization())  # NVIDIA GPUs
sol = solve(prob, MetalOffload32MixedLUFactorization())  # Apple Silicon
sol = solve(prob, AppleAccelerate32MixedLUFactorization())  # Apple CPUs

Implementation Details

Factorization performed in 32-bit precision to reduce memory bandwidth requirements
Solution converted back to original precision (Float64/ComplexF64)
Particularly effective for problems where memory bandwidth is the bottleneck
Maintains reasonable accuracy for well-conditioned problems

Test Plan

Added test file test/test_mixed_precision.jl
Tests pass for MKL mixed precision implementation
Tests handle complex matrices correctly
GPU implementations defined (require hardware/packages for full testing)

🤖 Generated with Claude Code

This commit introduces four new mixed precision LU factorization algorithms that perform computations in Float32 while maintaining Float64 interfaces, providing significant performance improvements for memory-bandwidth limited problems. New factorization methods: - CUDAOffload32MixedLUFactorization: GPU-accelerated mixed precision for NVIDIA GPUs - MetalOffload32MixedLUFactorization: GPU-accelerated mixed precision for Apple Metal - MKL32MixedLUFactorization: CPU-based mixed precision using Intel MKL - AppleAccelerate32MixedLUFactorization: CPU-based mixed precision using Apple Accelerate Key features: - Transparent Float64 to Float32 conversion for factorization - Support for both real and complex matrices - Up to 2x speedup for large, well-conditioned matrices - Maintains reasonable accuracy while reducing memory bandwidth requirements The implementations handle precision conversion internally, making them easy to use as drop-in replacements for standard LU factorization when reduced precision is acceptable. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Added mixed precision tests to the Core test group in runtests.jl - Added documentation for all four mixed precision methods in docs - Added section explaining when to use mixed precision methods - Documentation includes performance characteristics and use cases The tests now run as part of the standard test suite, and the documentation provides clear guidance on when these methods are beneficial (large well-conditioned problems with memory bandwidth bottlenecks). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

docs/src/solvers/solvers.md

claude added 2 commits August 20, 2025 09:37

ChrisRackauckas reviewed Aug 20, 2025

View reviewed changes

docs/src/solvers/solvers.md Outdated Show resolved Hide resolved

Update docs/src/solvers/solvers.md

3bc0561

ChrisRackauckas merged commit 42ef6f2 into SciML:main Aug 20, 2025
133 of 136 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add mixed precision LU factorization methods #746

Add mixed precision LU factorization methods #746

Uh oh!

ChrisRackauckas-Claude commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add mixed precision LU factorization methods #746

Add mixed precision LU factorization methods #746

Uh oh!

Conversation

ChrisRackauckas-Claude commented Aug 20, 2025

Summary

New Factorization Methods

Key Features

Usage Example

Implementation Details

Test Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!