You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add 32-bit mixed precision solvers for OpenBLAS and RecursiveFactorization (#753)
* Add 32-bit mixed precision solvers for OpenBLAS and RecursiveFactorization
Adds two new mixed precision LU factorization algorithms that perform factorization
in Float32 precision while maintaining Float64 interface for improved performance:
- OpenBLAS32MixedLUFactorization: Mixed precision solver using OpenBLAS
- RF32MixedLUFactorization: Mixed precision solver using RecursiveFactorization.jl
These solvers follow the same pattern as the existing MKL32MixedLUFactorization
and AppleAccelerate32MixedLUFactorization implementations, providing:
- ~2x speedup for memory-bandwidth limited problems
- Support for both real and complex matrices
- Automatic precision conversion and management
- Comprehensive test coverage
The RF32MixedLUFactorization also supports pivoting options for trading
stability vs performance.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* Fix resolve.jl tests for mixed precision solvers
- Add higher tolerance for mixed precision algorithms (atol=1e-5, rtol=1e-5)
- Skip tests for algorithms that require unavailable packages
- Add proper checks for RF32MixedLUFactorization and OpenBLAS32MixedLUFactorization
The mixed precision algorithms naturally have lower accuracy than full precision,
so they need relaxed tolerances in the tests.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* Add RecursiveFactorization to Project.toml for tests
* Apply formatting and fix additional test compatibility
- Format code with JuliaFormatter SciMLStyle
- Update resolve.jl tests to properly handle mixed precision algorithms
- Add appropriate tolerance checks for Float32 precision solvers
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* Move RecursiveFactorization back to weakdeps
RecursiveFactorization should remain as a weak dependency since it's optional and loaded via an extension.
* Increase tolerance for mixed precision tests in resolve.jl
Mixed precision algorithms need higher tolerance due to reduced precision arithmetic.
Increased from atol=1e-5, rtol=1e-5 to atol=1e-4, rtol=1e-4.
* Fix mixed precision detection in resolve.jl tests
Use string matching to detect mixed precision algorithms instead of symbol comparison.
This ensures the tolerance branch is properly taken for algorithms like RF32MixedLUFactorization.
* Fix RF32MixedLUFactorization segfault issue
- Simplified cache initialization to only store the LU factorization object
- RecursiveFactorization.lu! returns an LU object that contains its own pivot vector
- Fixed improper pivot vector handling that was causing segfaults
* Delete test/Project.toml
* Match RF32MixedLUFactorization pivoting with RFLUFactorization
- Store (fact, ipiv) tuple in cache exactly like RFLUFactorization
- Pass ipiv to RecursiveFactorization.lu! and store both fact and ipiv
- Retrieve factorization using @get_cacheval()[1] pattern
- This ensures consistent behavior between the two implementations
* fix rebase
* Don't test no-pivot RFLU
---------
Co-authored-by: Claude <[email protected]>
Co-authored-by: ChrisRackauckas <[email protected]>
0 commit comments