⚡️ Speed up function `matrix_inverse` by 248% #207

codeflash-ai · 2025-12-23T03:17:03Z

📄 248% (2.48x) speedup for `matrix_inverse` in `src/numerical/linear_algebra.py`

⏱️ Runtime : 104 milliseconds → 30.0 milliseconds (best of 98 runs)

📝 Explanation and details

The optimized code achieves a 247% speedup by replacing the nested Python loop with vectorized NumPy operations. The key optimization is eliminating the inner for j in range(n) loop that performed Gaussian elimination one row at a time.

What changed:

Vectorized row elimination: Instead of iterating through each row j and updating it individually, the code now processes all rows (except the pivot row) simultaneously using NumPy array operations with masks and broadcasting.
In-place division: Changed from augmented[i] = augmented[i] / pivot to augmented[i] /= pivot, avoiding temporary array allocation.
Type safety: Added .astype(float, copy=False) to ensure float dtype without unnecessary copying.

Why it's faster:
The original code spent 87% of its time in the nested loop (lines showing 63.1% for the subtraction and 13.6% for factor extraction). These operations created Python-level iteration overhead with ~65,000 loop iterations for typical test cases. The optimized version leverages NumPy's C-level vectorization and BLAS operations, processing multiple rows in a single operation: augmented[mask] -= factors[:, np.newaxis] * augmented[i].

Performance characteristics:

Small matrices (≤5x5): The optimization shows 35-47% slowdown due to vectorization overhead (mask creation, fancy indexing) exceeding the benefit for tiny workloads.
Large matrices (≥50x50): Shows 254-262% speedup where vectorization dominates—the nested loop's O(n³) Python overhead becomes prohibitive, while vectorized operations scale efficiently.

The break-even point appears around n=10-20. For functions called in computational hot paths with medium-to-large matrices, this optimization significantly reduces runtime by avoiding Python's per-iteration overhead.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 37 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import numpy as np

# imports
import pytest  # used for our unit tests
from src.numerical.linear_algebra import matrix_inverse

# unit tests

# 1. BASIC TEST CASES


def test_identity_matrix_2x2():
    # The inverse of the identity is itself
    I = np.eye(2)
    codeflash_output = matrix_inverse(I)
    inv = codeflash_output  # 12.6μs -> 23.4μs (46.3% slower)


def test_identity_matrix_3x3():
    # The inverse of the identity is itself
    I = np.eye(3)
    codeflash_output = matrix_inverse(I)
    inv = codeflash_output  # 20.3μs -> 33.0μs (38.3% slower)


def test_simple_2x2_matrix():
    # Test with a simple invertible 2x2 matrix
    A = np.array([[4, 7], [2, 6]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 13.3μs -> 24.0μs (44.4% slower)


def test_simple_3x3_matrix():
    # Test with a simple invertible 3x3 matrix
    A = np.array([[1, 2, 3], [0, 1, 4], [5, 6, 0]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 20.8μs -> 32.7μs (36.4% slower)


def test_negative_entries():
    # Test with negative entries
    A = np.array([[2, -1], [-1, 2]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 13.1μs -> 24.2μs (45.9% slower)


def test_float_entries():
    # Test with float entries
    A = np.array([[1.5, 2.5], [3.5, 4.5]])
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 13.3μs -> 24.0μs (44.5% slower)


# 2. EDGE TEST CASES


def test_non_square_matrix_raises():
    # Should raise ValueError for non-square matrices
    A = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
    with pytest.raises(ValueError):
        matrix_inverse(A)  # 1.00μs -> 1.00μs (0.000% faster)


def test_ill_conditioned_matrix():
    # Matrix with very small determinant, but not singular
    A = np.array([[1, 1], [1, 1.000001]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 15.0μs -> 26.8μs (43.9% slower)


def test_large_values():
    # Matrix with large values
    A = np.array([[1e10, 2e10], [3e10, 4e10]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 13.6μs -> 24.9μs (45.2% slower)


def test_small_values():
    # Matrix with small values
    A = np.array([[1e-10, 2e-10], [3e-10, 4e-10]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 13.5μs -> 24.1μs (44.2% slower)


def test_matrix_with_zeros_off_diagonal():
    # Diagonal matrix with zeros off-diagonal
    A = np.diag([2, 3, 4])
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 21.3μs -> 33.9μs (37.2% slower)


def test_large_identity_matrix():
    # Test for a large identity matrix (100x100)
    I = np.eye(100)
    codeflash_output = matrix_inverse(I)
    inv = codeflash_output  # 16.2ms -> 4.52ms (258% faster)


def test_large_random_invertible_matrix():
    # Test for a large random invertible matrix (50x50)
    np.random.seed(42)
    while True:
        A = np.random.rand(50, 50)
        if np.linalg.matrix_rank(A) == 50:
            break
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 3.73ms -> 1.03ms (262% faster)


def test_large_diagonal_matrix():
    # Test for a large diagonal matrix (100x100)
    diag = np.arange(1, 101, dtype=float)
    A = np.diag(diag)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 16.2ms -> 4.52ms (260% faster)


def test_large_ill_conditioned_matrix():
    # Large ill-conditioned matrix (but not singular)
    diag = np.ones(100)
    diag[-1] = 1e-8  # Make one diagonal element much smaller
    A = np.diag(diag)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A)
    result = codeflash_output  # 16.0ms -> 4.52ms (254% faster)


# 4. ADDITIONAL FUNCTIONALITY TESTS


def test_inverse_property():
    # For a random invertible matrix, A * A_inv should be close to identity
    np.random.seed(123)
    while True:
        A = np.random.rand(5, 5)
        if np.linalg.matrix_rank(A) == 5:
            break
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 45.0μs -> 49.8μs (9.78% slower)
    result = np.dot(A, inv)


def test_inverse_of_inverse_is_original():
    # For a random invertible matrix, inverse of the inverse is the original
    np.random.seed(456)
    while True:
        A = np.random.rand(4, 4)
        if np.linalg.matrix_rank(A) == 4:
            break
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 31.2μs -> 41.4μs (24.7% slower)
    codeflash_output = matrix_inverse(inv)
    invinv = codeflash_output  # 27.2μs -> 35.5μs (23.4% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np

# imports
import pytest  # used for our unit tests
from src.numerical.linear_algebra import matrix_inverse

# unit tests

# ---- Basic Test Cases ----


def test_identity_matrix():
    # Test that the inverse of the identity matrix is itself
    I = np.eye(3)
    codeflash_output = matrix_inverse(I)
    inv = codeflash_output  # 20.5μs -> 31.9μs (35.8% slower)


def test_simple_2x2():
    # Test a simple 2x2 invertible matrix
    A = np.array([[4, 7], [2, 6]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 13.6μs -> 24.9μs (45.5% slower)
    expected = np.linalg.inv(A)


def test_simple_3x3():
    # Test a simple 3x3 invertible matrix
    A = np.array([[1, 2, 3], [0, 1, 4], [5, 6, 0]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 20.8μs -> 32.0μs (35.2% slower)
    expected = np.linalg.inv(A)


def test_negative_entries():
    # Test matrix with negative entries
    A = np.array([[2, -1], [-1, 2]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 13.5μs -> 24.5μs (45.0% slower)
    expected = np.linalg.inv(A)


# ---- Edge Test Cases ----


def test_non_square_matrix_raises():
    # Should raise ValueError for non-square matrices
    A = np.ones((2, 3))
    with pytest.raises(ValueError):
        matrix_inverse(A)  # 917ns -> 1.00μs (8.30% slower)


def test_ill_conditioned_matrix():
    # Matrix with very small determinant (ill-conditioned), check for numerical stability
    eps = 1e-10
    A = np.array([[1, 1], [1, 1 + eps]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 15.0μs -> 26.4μs (43.3% slower)
    expected = np.linalg.inv(A)


def test_almost_singular_matrix():
    # Matrix that is nearly singular but not exactly
    eps = 1e-12
    A = np.array([[1, 2], [2, 4 + eps]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 13.8μs -> 24.9μs (44.4% slower)
    expected = np.linalg.inv(A)


def test_swap_rows_needed():
    # Matrix that requires row swapping for correct pivoting
    A = np.array([[0, 1], [1, 0]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 21.8μs -> 32.9μs (33.9% slower)
    expected = np.linalg.inv(A)


def test_1x1_matrix():
    # 1x1 matrix should return reciprocal
    A = np.array([[5]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 9.42μs -> 18.0μs (47.7% slower)


def test_invert_and_multiply_gives_identity():
    # A * A_inv should be close to identity
    A = np.array([[3, 7], [2, 5]], dtype=float)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 13.4μs -> 24.3μs (44.8% slower)
    prod = np.dot(A, inv)


# ---- Large Scale Test Cases ----


def test_large_random_matrix():
    # Test inversion of a large random matrix (e.g. 50x50)
    np.random.seed(42)
    A = np.random.rand(50, 50)
    # Ensure matrix is invertible by adding identity scaled
    A += 50 * np.eye(50)
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 3.72ms -> 1.05ms (255% faster)
    expected = np.linalg.inv(A)


def test_large_identity_matrix():
    # Inverse of large identity should be itself
    I = np.eye(100)
    codeflash_output = matrix_inverse(I)
    inv = codeflash_output  # 16.0ms -> 4.54ms (253% faster)


def test_large_diagonal_matrix():
    # Inverse of a diagonal matrix is the reciprocal of the diagonal
    D = np.diag(np.arange(1, 101, dtype=float))
    codeflash_output = matrix_inverse(D)
    inv = codeflash_output  # 16.0ms -> 4.52ms (254% faster)
    expected = np.diag(1 / np.arange(1, 101, dtype=float))


def test_large_sparse_like_matrix():
    # Large matrix with mostly zeros but invertible
    D = np.eye(100) * 2
    D[0, 1] = 0.1
    D[1, 0] = 0.1
    codeflash_output = matrix_inverse(D)
    inv = codeflash_output  # 16.0ms -> 4.53ms (254% faster)
    expected = np.linalg.inv(D)


# ---- Additional Robustness Tests ----


@pytest.mark.parametrize(
    "A",
    [
        np.array([[1.0, 2.0], [3.0, 4.0]]),
        np.array([[7.0, 5.0], [2.0, 3.0]]),
        np.array([[2.0, 0.0], [0.0, 2.0]]),
        np.array([[0.0, 1.0], [1.0, 0.0]]),
    ],
)
def test_inverse_property(A):
    # For a variety of 2x2 matrices, check A * inv(A) == I
    codeflash_output = matrix_inverse(A)
    inv = codeflash_output  # 64.0μs -> 108μs (40.9% slower)
    prod = np.dot(A, inv)


def test_input_not_modified():
    # Ensure the input matrix is not modified in-place
    A = np.array([[1.0, 2.0], [3.0, 4.0]])
    A_copy = A.copy()
    codeflash_output = matrix_inverse(A)
    _ = codeflash_output  # 13.7μs -> 24.3μs (43.8% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.numerical.linear_algebra import matrix_inverse

To edit these changes git checkout codeflash/optimize-matrix_inverse-mji0kexa and push.

The optimized code achieves a **247% speedup** by replacing the nested Python loop with vectorized NumPy operations. The key optimization is eliminating the inner `for j in range(n)` loop that performed Gaussian elimination one row at a time. **What changed:** 1. **Vectorized row elimination**: Instead of iterating through each row `j` and updating it individually, the code now processes all rows (except the pivot row) simultaneously using NumPy array operations with masks and broadcasting. 2. **In-place division**: Changed from `augmented[i] = augmented[i] / pivot` to `augmented[i] /= pivot`, avoiding temporary array allocation. 3. **Type safety**: Added `.astype(float, copy=False)` to ensure float dtype without unnecessary copying. **Why it's faster:** The original code spent **87%** of its time in the nested loop (lines showing 63.1% for the subtraction and 13.6% for factor extraction). These operations created Python-level iteration overhead with ~65,000 loop iterations for typical test cases. The optimized version leverages NumPy's C-level vectorization and BLAS operations, processing multiple rows in a single operation: `augmented[mask] -= factors[:, np.newaxis] * augmented[i]`. **Performance characteristics:** - **Small matrices (≤5x5)**: The optimization shows 35-47% **slowdown** due to vectorization overhead (mask creation, fancy indexing) exceeding the benefit for tiny workloads. - **Large matrices (≥50x50)**: Shows **254-262% speedup** where vectorization dominates—the nested loop's O(n³) Python overhead becomes prohibitive, while vectorized operations scale efficiently. The break-even point appears around n=10-20. For functions called in computational hot paths with medium-to-large matrices, this optimization significantly reduces runtime by avoiding Python's per-iteration overhead.

codeflash-ai bot requested a review from KRRT7 December 23, 2025 03:17

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Dec 23, 2025

KRRT7 closed this Dec 23, 2025

codeflash-ai bot deleted the codeflash/optimize-matrix_inverse-mji0kexa branch December 23, 2025 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `matrix_inverse` by 248% #207

⚡️ Speed up function `matrix_inverse` by 248% #207

Uh oh!

codeflash-ai bot commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up function matrix_inverse by 248% #207

⚡️ Speed up function matrix_inverse by 248% #207

Uh oh!

Conversation

codeflash-ai bot commented Dec 23, 2025

📄 248% (2.48x) speedup for matrix_inverse in src/numerical/linear_algebra.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up function `matrix_inverse` by 248% #207

⚡️ Speed up function `matrix_inverse` by 248% #207

📄 248% (2.48x) speedup for `matrix_inverse` in `src/numerical/linear_algebra.py`