⚡️ Speed up function `matrix_inverse` by 534% #21

codeflash-ai · 2025-06-21T00:32:11Z

📄 534% (5.34x) speedup for `matrix_inverse` in `src/numpy_pandas/matrix_operations.py`

⏱️ Runtime : 84.2 milliseconds → 13.3 milliseconds (best of 187 runs)

📝 Explanation and details

Here's an optimized version of your matrix_inverse function, focusing on avoiding Python for-loops in favor of fast NumPy array operations. The heart of your performance problem is the double for-loop, which can be partly vectorized.
We also avoid repeated slicing and use in-place operations for better cache efficiency.

Key optimizations:

In-place operations: Use /= instead of creating new arrays for each row scaling.
NumPy vectorization: Where feasible, eliminate the inner loop — all j > i and j < i rows are updated in a block.
Avoid unnecessary casting/copies: Use astype(float, copy=False) so the input is avoided being copied if already float.
No change to output or signature. All steps and error checks preserved.

This will drastically reduce the time spent on row subtraction, which was previously the slowest part.
If you want even more performance, consider using np.linalg.inv for production unless you need to teach the algorithm!

Let me know if you want a pure Cython/Numba optimized version for even more speed.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 35 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np
# imports
import pytest  # used for our unit tests
from src.numpy_pandas.matrix_operations import matrix_inverse

# unit tests

# ----------- BASIC TEST CASES -----------

def test_identity_matrix_inverse():
    # 1x1 identity
    I1 = np.eye(1)
    codeflash_output = matrix_inverse(I1); inv = codeflash_output # 8.42μs -> 8.25μs (2.02% faster)
    # 2x2 identity
    I2 = np.eye(2)
    codeflash_output = matrix_inverse(I2); inv2 = codeflash_output # 8.42μs -> 8.25μs (2.02% faster)
    # 5x5 identity
    I5 = np.eye(5)
    codeflash_output = matrix_inverse(I5); inv5 = codeflash_output # 8.42μs -> 8.25μs (2.02% faster)

def test_simple_2x2_matrix():
    # Test with a simple 2x2 invertible matrix
    A = np.array([[4, 7], [2, 6]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 14.2μs -> 15.4μs (7.32% slower)
    expected = np.linalg.inv(A)

def test_simple_3x3_matrix():
    # Test with a simple 3x3 invertible matrix
    A = np.array([[1, 2, 3], [0, 1, 4], [5, 6, 0]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 22.3μs -> 21.3μs (4.89% faster)
    expected = np.linalg.inv(A)

def test_negative_and_fractional_entries():
    # Matrix with negative and fractional entries
    A = np.array([[0.5, -1], [2, 3.5]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 13.9μs -> 15.3μs (9.23% slower)
    expected = np.linalg.inv(A)

def test_inverse_property():
    # Check that A @ A_inv == I for a random 3x3 matrix
    np.random.seed(0)
    A = np.random.rand(3, 3)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 22.2μs -> 21.2μs (4.51% faster)
    prod = np.dot(A, inv)

# ----------- EDGE TEST CASES -----------

def test_non_square_matrix_raises():
    # Should raise ValueError for non-square matrix
    A = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
    with pytest.raises(ValueError):
        matrix_inverse(A)



def test_matrix_with_zero_pivot_needs_row_swap():
    # Matrix with zero on the diagonal, but invertible
    A = np.array([[0, 1], [1, 0]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 21.9μs -> 23.1μs (5.23% slower)
    expected = np.linalg.inv(A)

def test_ill_conditioned_matrix():
    # Matrix with very small determinant (ill-conditioned)
    A = np.array([[1, 1], [1, 1.0000001]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 14.4μs -> 15.7μs (8.22% slower)
    expected = np.linalg.inv(A)

def test_1x1_matrix():
    # 1x1 invertible matrix
    A = np.array([[7]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 9.17μs -> 9.08μs (0.925% faster)
    expected = np.array([[1/7]])


def test_large_values_matrix():
    # Matrix with very large values
    A = np.array([[1e10, 2e10], [3e10, 4e10]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 15.2μs -> 16.5μs (8.06% slower)
    expected = np.linalg.inv(A)

def test_small_values_matrix():
    # Matrix with very small values
    A = np.array([[1e-10, 2e-10], [3e-10, 4e-10]], dtype=float)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 14.4μs -> 15.6μs (7.73% slower)
    expected = np.linalg.inv(A)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_10x10_random_matrix():
    # Test with a random 10x10 invertible matrix
    np.random.seed(42)
    A = np.random.rand(10, 10)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 166μs -> 64.1μs (160% faster)
    expected = np.linalg.inv(A)

def test_large_50x50_random_matrix():
    # Test with a random 50x50 invertible matrix
    np.random.seed(123)
    A = np.random.rand(50, 50)
    # Ensure matrix is invertible by adding identity * 10
    A += 10 * np.eye(50)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 4.03ms -> 551μs (631% faster)
    expected = np.linalg.inv(A)

def test_inverse_property_large():
    # Check that A @ A_inv == I for a random 30x30 matrix
    np.random.seed(100)
    A = np.random.rand(30, 30)
    # Ensure invertibility
    A += 5 * np.eye(30)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 1.42ms -> 222μs (538% faster)
    prod = np.dot(A, inv)

def test_large_matrix_with_row_swaps():
    # Matrix that requires row swaps for pivoting
    A = np.eye(20)
    A[0, 0] = 0
    A[0, 1] = 1
    A[1, 0] = 1
    A[1, 1] = 0
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 664μs -> 139μs (378% faster)
    expected = np.linalg.inv(A)

def test_performance_on_100x100_matrix():
    # Test performance/scalability on 100x100 matrix
    np.random.seed(555)
    A = np.random.rand(100, 100)
    # Ensure invertibility
    A += 20 * np.eye(100)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 17.3ms -> 2.73ms (531% faster)
    expected = np.linalg.inv(A)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np
# imports
import pytest  # used for our unit tests
from src.numpy_pandas.matrix_operations import matrix_inverse

# unit tests

# ------------------ BASIC TEST CASES ------------------

def test_identity_matrix():
    # Test that the inverse of the identity is itself
    I = np.eye(3)
    codeflash_output = matrix_inverse(I); inv = codeflash_output # 21.8μs -> 20.6μs (5.87% faster)

def test_diagonal_matrix():
    # Diagonal matrix inversion
    D = np.diag([2, 3, 4])
    expected = np.diag([0.5, 1/3, 0.25])
    codeflash_output = matrix_inverse(D); inv = codeflash_output # 21.8μs -> 21.6μs (0.968% faster)

def test_simple_2x2():
    # Simple 2x2 matrix
    A = np.array([[4, 7], [2, 6]], dtype=float)
    expected = np.array([[0.6, -0.7], [-0.2, 0.4]])
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 14.2μs -> 15.6μs (9.09% slower)

def test_simple_3x3():
    # Simple 3x3 matrix
    A = np.array([[1, 2, 3], [0, 1, 4], [5, 6, 0]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 22.0μs -> 21.2μs (3.54% faster)

# ------------------ EDGE TEST CASES ------------------

def test_non_square_matrix_raises():
    # Non-square matrix should raise ValueError
    A = np.ones((2, 3))
    with pytest.raises(ValueError):
        matrix_inverse(A)


def test_almost_singular_matrix():
    # Matrix that is very close to singular (ill-conditioned)
    eps = 1e-10
    A = np.array([[1, 1], [1, 1+eps]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 14.9μs -> 16.2μs (8.21% slower)


def test_permutation_matrix():
    # Permutation matrix (should be its own inverse)
    P = np.array([[0, 1, 0],
                  [0, 0, 1],
                  [1, 0, 0]], dtype=float)
    expected = P.T  # Inverse of permutation is its transpose
    codeflash_output = matrix_inverse(P); inv = codeflash_output # 30.8μs -> 30.2μs (1.93% faster)

def test_negative_entries():
    # Matrix with negative entries
    A = np.array([[2, -1], [-1, 2]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 14.3μs -> 15.5μs (7.52% slower)

def test_float_precision():
    # Matrix with float entries
    A = np.array([[1.5, 2.5], [3.5, 4.5]])
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 14.1μs -> 15.2μs (7.38% slower)

def test_row_swap_needed():
    # Matrix that needs a row swap for pivoting
    A = np.array([[0, 2], [1, 3]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 20.7μs -> 22.0μs (5.69% slower)

def test_large_condition_number():
    # Matrix with large condition number (ill-conditioned)
    A = np.array([[1, 1], [1, 1.000001]], dtype=float)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 13.9μs -> 15.1μs (7.73% slower)

def test_1x1_matrix():
    # 1x1 matrix
    A = np.array([[5]], dtype=float)
    expected = np.array([[0.2]])
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 9.21μs -> 9.08μs (1.37% faster)

# ------------------ LARGE SCALE TEST CASES ------------------

def test_large_identity():
    # Large identity matrix (100x100)
    n = 100
    I = np.eye(n)
    codeflash_output = matrix_inverse(I); inv = codeflash_output # 17.4ms -> 2.66ms (555% faster)

def test_large_random_invertible():
    # Large random invertible matrix (50x50)
    np.random.seed(0)
    n = 50
    while True:
        A = np.random.rand(n, n)
        if np.linalg.matrix_rank(A) == n:
            break
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 4.03ms -> 553μs (628% faster)

def test_large_diagonal():
    # Large diagonal matrix (100x100)
    n = 100
    diag = np.arange(1, n+1)
    A = np.diag(diag)
    expected = np.diag(1/diag)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 17.4ms -> 2.66ms (555% faster)

def test_large_sparse_like():
    # Large sparse-like matrix (mostly zeros, but invertible)
    n = 50
    A = np.eye(n) + np.diag(np.ones(n-1), k=1)
    expected = np.linalg.inv(A)
    codeflash_output = matrix_inverse(A); inv = codeflash_output # 3.99ms -> 550μs (625% faster)

def test_large_permutation():
    # Large permutation matrix (should be its own transpose)
    n = 100
    P = np.eye(n)[::-1]
    expected = P.T
    codeflash_output = matrix_inverse(P); inv = codeflash_output # 17.5ms -> 2.74ms (537% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.numpy_pandas.matrix_operations import matrix_inverse

To edit these changes git checkout codeflash/optimize-matrix_inverse-mc5i7sy2 and push.

Here's an **optimized** version of your `matrix_inverse` function, focusing on avoiding Python for-loops in favor of fast NumPy array operations. The heart of your performance problem is the double for-loop, which can be partly vectorized. We also avoid repeated slicing and use in-place operations for better cache efficiency. **Key optimizations:** - **In-place operations**: Use `/=` instead of creating new arrays for each row scaling. - **NumPy vectorization**: Where feasible, eliminate the inner loop — all `j > i` and `j < i` rows are updated in a block. - **Avoid unnecessary casting/copies**: Use `astype(float, copy=False)` so the input is avoided being copied if already float. - **No change to output or signature.** All steps and error checks preserved. This will **drastically reduce the time spent on row subtraction**, which was previously the slowest part. If you want even more performance, consider using `np.linalg.inv` for production unless you need to teach the algorithm! Let me know if you want a pure Cython/Numba optimized version for even more speed.

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 21, 2025

codeflash-ai bot requested a review from KRRT7 June 21, 2025 00:32

KRRT7 closed this Jun 23, 2025

codeflash-ai bot deleted the codeflash/optimize-matrix_inverse-mc5i7sy2 branch June 23, 2025 23:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `matrix_inverse` by 534% #21

⚡️ Speed up function `matrix_inverse` by 534% #21

Uh oh!

codeflash-ai bot commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function matrix_inverse by 534% #21

⚡️ Speed up function matrix_inverse by 534% #21

Uh oh!

Conversation

codeflash-ai bot commented Jun 21, 2025

📄 534% (5.34x) speedup for matrix_inverse in src/numpy_pandas/matrix_operations.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `matrix_inverse` by 534% #21

⚡️ Speed up function `matrix_inverse` by 534% #21

📄 534% (5.34x) speedup for `matrix_inverse` in `src/numpy_pandas/matrix_operations.py`