Add tests and documentation for mixed precision methods

claude · claude · commit 34b06a20ee30 · 2025-08-20T12:57:12.000-04:00
- Added mixed precision tests to the Core test group in runtests.jl - Added documentation for all four mixed precision methods in docs - Added section explaining when to use mixed precision methods - Documentation includes performance characteristics and use cases The tests now run as part of the standard test suite, and the documentation provides clear guidance on when these methods are beneficial (large well-conditioned problems with memory bandwidth bottlenecks). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
diff --git a/docs/src/solvers/solvers.md b/docs/src/solvers/solvers.md
@@ -32,6 +32,24 @@ this is only recommended for Float32 matrices. Choose `CudaOffloadLUFactorizatio
 performance on well-conditioned problems, or `CudaOffloadQRFactorization` for better numerical 
 stability on ill-conditioned problems.
 
+#### Mixed Precision Methods
+
+For large well-conditioned problems where memory bandwidth is the bottleneck, mixed precision 
+methods can provide significant speedups (up to 2x) by performing the factorization in Float32 
+while maintaining Float64 interfaces. These methods are particularly effective for:
+- Large dense matrices (> 1000x1000)
+- Well-conditioned problems (condition number < 10^4)
+- Hardware with good Float32 performance
+
+Available mixed precision solvers:
+- `MKL32MixedLUFactorization` - Intel CPUs with MKL
+- `AppleAccelerate32MixedLUFactorization` - Apple CPUs with Accelerate
+- `CUDAOffload32MixedLUFactorization` - NVIDIA GPUs with CUDA
+- `MetalOffload32MixedLUFactorization` - Apple GPUs with Metal
+
+These methods automatically handle the precision conversion, making them easy drop-in replacements
+when reduced precision is acceptable for the factorization step.
+
 !!! note
     
     Performance details for dense LU-factorizations can be highly dependent on the hardware configuration.
@@ -205,6 +223,7 @@ KrylovJL
 
 ```@docs
 MKLLUFactorization
+MKL32MixedLUFactorization
 ```
 
 ### AppleAccelerate.jl
@@ -215,6 +234,7 @@ MKLLUFactorization
 
 ```@docs
 AppleAccelerateLUFactorization
+AppleAccelerate32MixedLUFactorization
 ```
 
 ### Metal.jl
@@ -225,6 +245,7 @@ AppleAccelerateLUFactorization
 
 ```@docs
 MetalLUFactorization
+MetalOffload32MixedLUFactorization
 ```
 
 ### Pardiso.jl
@@ -251,6 +272,7 @@ The following are non-standard GPU factorization routines.
 ```@docs
 CudaOffloadLUFactorization
 CudaOffloadQRFactorization
+CUDAOffload32MixedLUFactorization
 ```
 
 ### AMDGPU.jl
diff --git a/test/runtests.jl b/test/runtests.jl
@@ -19,6 +19,7 @@ if GROUP == "All" || GROUP == "Core"
     @time @safetestset "Traits" include("traits.jl")
     @time @safetestset "Verbosity" include("verbosity.jl")
     @time @safetestset "BandedMatrices" include("banded.jl")
+    @time @safetestset "Mixed Precision" include("test_mixed_precision.jl")
 end
 
 # Don't run Enzyme tests on prerelease