⚡️ Speed up function matrix_inverse by 248%
#207
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 248% (2.48x) speedup for
matrix_inverseinsrc/numerical/linear_algebra.py⏱️ Runtime :
104 milliseconds→30.0 milliseconds(best of98runs)📝 Explanation and details
The optimized code achieves a 247% speedup by replacing the nested Python loop with vectorized NumPy operations. The key optimization is eliminating the inner
for j in range(n)loop that performed Gaussian elimination one row at a time.What changed:
jand updating it individually, the code now processes all rows (except the pivot row) simultaneously using NumPy array operations with masks and broadcasting.augmented[i] = augmented[i] / pivottoaugmented[i] /= pivot, avoiding temporary array allocation..astype(float, copy=False)to ensure float dtype without unnecessary copying.Why it's faster:
The original code spent 87% of its time in the nested loop (lines showing 63.1% for the subtraction and 13.6% for factor extraction). These operations created Python-level iteration overhead with ~65,000 loop iterations for typical test cases. The optimized version leverages NumPy's C-level vectorization and BLAS operations, processing multiple rows in a single operation:
augmented[mask] -= factors[:, np.newaxis] * augmented[i].Performance characteristics:
The break-even point appears around n=10-20. For functions called in computational hot paths with medium-to-large matrices, this optimization significantly reduces runtime by avoiding Python's per-iteration overhead.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-matrix_inverse-mji0kexaand push.