Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 23, 2025

📄 13% (0.13x) speedup for cosine_similarity_top_k in src/statistics/similarity.py

⏱️ Runtime : 5.37 milliseconds 4.75 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 13% speedup through several key optimizations in the cosine_similarity function:

Key Optimizations:

  1. More efficient array conversion: Uses np.asarray(X, dtype=np.float64) instead of np.array(X). This avoids unnecessary copies when the input is already a numpy array and ensures consistent float64 precision.

  2. Broadcasting optimization: Adds keepdims=True to norm calculations, allowing X_norm @ Y_norm.T instead of the more expensive np.outer(X_norm, Y_norm). This reduces memory allocation and leverages optimized matrix multiplication.

  3. Improved NaN/Inf handling: Replaces the boolean indexing approach with np.copyto(..., where=~np.isfinite(...)) and np.errstate context manager, which is more efficient for in-place operations.

  4. Minor variable caching: Stores flat_scores = score_array.flatten() to avoid repeated flatten operations.

Performance Impact by Test Case:

  • Zero/sparse vectors see largest gains (24-30% faster): The optimized NaN/Inf handling is particularly effective when dealing with zero vectors that produce division by zero.
  • Regular computation cases show consistent 4-8% improvements across various matrix sizes and configurations.
  • Large-scale tests (100+ vectors) benefit significantly (15-28% faster) due to the more efficient matrix operations.

Why It's Faster:

The np.outer operation in the original creates a full matrix multiplication, while the optimized version uses broadcasting with @ operator which is more cache-friendly and leverages BLAS optimizations. The keepdims=True eliminates the need for reshaping operations, and np.asarray with explicit dtype avoids potential type inference overhead.

The optimization maintains identical output behavior while being particularly effective for workloads involving similarity computations on larger datasets or scenarios with many zero vectors.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 47 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 95.5%
🌀 Click to see Generated Regression Tests
from typing import List, Optional, Tuple, Union

# function to test
import numpy as np

# imports
import pytest  # used for our unit tests
from src.statistics.similarity import cosine_similarity_top_k

Matrix = Union[List[List[float]], List[np.ndarray], np.ndarray]
from src.statistics.similarity import cosine_similarity_top_k

# unit tests

# -------- BASIC TEST CASES --------


def test_basic_identical_vectors():
    # Identical vectors should yield cosine similarity 1.0
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=2
    )  # 25.7μs -> 24.0μs (7.13% faster)


def test_basic_orthogonal_vectors():
    # Orthogonal vectors should yield cosine similarity 0.0
    X = [[1, 0], [0, 1]]
    Y = [[0, 1], [1, 0]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=2
    )  # 25.3μs -> 23.9μs (6.11% faster)


def test_basic_top_k_less_than_total():
    # Test top_k less than total possible pairs
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1], [1, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=2
    )  # 25.7μs -> 24.0μs (6.94% faster)


def test_basic_score_threshold():
    # Only pairs above threshold should be returned
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1], [1, 1]]
    # (0,0) and (1,1) are 1.0, (0,2) and (1,2) are ~0.707
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=4, score_threshold=0.8
    )  # 25.4μs -> 24.1μs (5.36% faster)


def test_basic_top_k_none():
    # top_k=None should return all pairs above threshold
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=None
    )  # 25.7μs -> 24.1μs (6.39% faster)


def test_basic_score_threshold_none():
    # score_threshold=None should not filter any scores
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=4, score_threshold=None
    )  # 25.5μs -> 24.0μs (6.24% faster)


# -------- EDGE TEST CASES --------


def test_edge_empty_X():
    # Empty X should return empty lists
    X = []
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(X, Y)  # 333ns -> 333ns (0.000% faster)


def test_edge_empty_Y():
    # Empty Y should return empty lists
    X = [[1, 0], [0, 1]]
    Y = []
    idxs, scores = cosine_similarity_top_k(X, Y)  # 375ns -> 375ns (0.000% faster)


def test_edge_empty_both():
    # Both empty should return empty lists
    X = []
    Y = []
    idxs, scores = cosine_similarity_top_k(X, Y)  # 375ns -> 333ns (12.6% faster)


def test_edge_single_vector_each():
    # Single vector in X and Y
    X = [[1, 2, 3]]
    Y = [[4, 5, 6]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=1
    )  # 24.6μs -> 23.6μs (4.24% faster)
    # The score should be correct
    dot = sum([a * b for a, b in zip(X[0], Y[0])])
    normX = sum([a * a for a in X[0]]) ** 0.5
    normY = sum([b * b for b in Y[0]]) ** 0.5
    expected = dot / (normX * normY)


def test_edge_zero_vector():
    # Zero vector should yield similarity 0.0
    X = [[0, 0, 0]]
    Y = [[1, 2, 3]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=1
    )  # 29.0μs -> 23.0μs (26.1% faster)


def test_edge_mismatched_dimensions():
    # Should raise ValueError if X and Y have different feature lengths
    X = [[1, 2, 3]]
    Y = [[1, 2]]
    with pytest.raises(ValueError):
        cosine_similarity_top_k(X, Y)  # 3.67μs -> 4.04μs (9.28% slower)


def test_edge_top_k_larger_than_possible():
    # top_k larger than total pairs should not error
    X = [[1, 0], [0, 1]]
    Y = [[1, 0]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=10
    )  # 26.0μs -> 24.4μs (6.32% faster)


def test_edge_negative_score_threshold():
    # Negative score_threshold should not filter anything
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, score_threshold=-1.0
    )  # 26.0μs -> 24.5μs (6.30% faster)


def test_edge_all_scores_below_threshold():
    # All scores below threshold should return empty
    X = [[1, 0], [0, 1]]
    Y = [[-1, 0], [0, -1]]
    # All similarities are -1.0
    idxs, scores = cosine_similarity_top_k(
        X, Y, score_threshold=0.1
    )  # 24.4μs -> 22.8μs (7.13% faster)


def test_edge_non_list_input():
    # Accept numpy arrays as input
    X = np.array([[1, 0], [0, 1]])
    Y = np.array([[1, 0], [0, 1]])
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=2
    )  # 24.5μs -> 23.7μs (3.69% faster)


def test_edge_list_of_np_arrays():
    # Accept list of np.ndarrays as input
    X = [np.array([1, 0]), np.array([0, 1])]
    Y = [np.array([1, 0]), np.array([0, 1])]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=2
    )  # 25.2μs -> 24.0μs (5.22% faster)


# -------- LARGE SCALE TEST CASES --------


def test_large_scale_many_vectors():
    # Test with 100 vectors of dimension 10
    np.random.seed(42)
    X = np.random.rand(100, 10).tolist()
    Y = np.random.rand(100, 10).tolist()
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=10
    )  # 807μs -> 766μs (5.45% faster)


def test_large_scale_top_k_none():
    # Test with top_k=None on large input
    np.random.seed(123)
    X = np.random.rand(50, 20)
    Y = np.random.rand(50, 20)
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=None
    )  # 647μs -> 634μs (2.03% faster)


def test_large_scale_score_threshold():
    # Only return scores above 0.95 in large random data
    np.random.seed(456)
    X = np.random.rand(100, 5)
    Y = np.random.rand(100, 5)
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=500, score_threshold=0.95
    )  # 877μs -> 811μs (8.06% faster)


def test_large_scale_zero_vectors():
    # All-zero vectors, all scores should be zero
    X = [[0] * 10 for _ in range(100)]
    Y = [[0] * 10 for _ in range(100)]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=10
    )  # 405μs -> 310μs (30.5% faster)


def test_large_scale_sparse_vectors():
    # Sparse vectors, most similarities should be zero
    X = [[1 if i == j else 0 for i in range(50)] for j in range(50)]
    Y = [[1 if i == j else 0 for i in range(50)] for j in range(50)]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=50
    )  # 343μs -> 265μs (29.7% faster)


# -------- MISCELLANEOUS TEST CASES --------


def test_misc_score_ordering():
    # Ensure returned scores are in descending order
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1], [1, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=6
    )  # 27.0μs -> 25.1μs (7.63% faster)


def test_misc_duplicate_scores():
    # If there are duplicate scores, all should be included up to top_k
    X = [[1, 0], [1, 0]]
    Y = [[1, 0], [1, 0]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=4
    )  # 25.8μs -> 24.4μs (5.64% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from typing import List, Optional, Tuple, Union

# function to test
import numpy as np

# imports
import pytest  # used for our unit tests
from src.statistics.similarity import cosine_similarity_top_k

Matrix = Union[List[List[float]], List[np.ndarray], np.ndarray]
from src.statistics.similarity import cosine_similarity_top_k

# unit tests

# -------------------- BASIC TEST CASES --------------------


def test_basic_identical_vectors():
    # Two identical vectors, similarity should be 1
    X = [[1, 2, 3]]
    Y = [[1, 2, 3]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=1
    )  # 24.5μs -> 23.2μs (5.94% faster)


def test_basic_orthogonal_vectors():
    # Two orthogonal vectors, similarity should be 0
    X = [[1, 0]]
    Y = [[0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=1
    )  # 23.6μs -> 22.7μs (4.23% faster)


def test_basic_multiple_vectors():
    # Multiple vectors, check top_k ordering
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=2
    )  # 25.5μs -> 24.0μs (6.24% faster)


def test_basic_different_top_k():
    # top_k larger than possible pairs
    X = [[1, 0]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=5
    )  # 25.3μs -> 23.7μs (6.85% faster)


def test_basic_score_threshold():
    # Only return scores above threshold
    X = [[1, 0]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=2, score_threshold=0.5
    )  # 25.0μs -> 23.5μs (6.38% faster)


# -------------------- EDGE TEST CASES --------------------


def test_edge_empty_X():
    # Empty X input
    X = []
    Y = [[1, 2, 3]]
    idxs, scores = cosine_similarity_top_k(X, Y)  # 333ns -> 334ns (0.299% slower)


def test_edge_empty_Y():
    # Empty Y input
    X = [[1, 2, 3]]
    Y = []
    idxs, scores = cosine_similarity_top_k(X, Y)  # 417ns -> 416ns (0.240% faster)


def test_edge_both_empty():
    # Both inputs empty
    X = []
    Y = []
    idxs, scores = cosine_similarity_top_k(X, Y)  # 333ns -> 333ns (0.000% faster)


def test_edge_zero_vectors():
    # Vectors with all zeros, similarity should be 0
    X = [[0, 0, 0]]
    Y = [[0, 0, 0]]
    idxs, scores = cosine_similarity_top_k(X, Y)  # 29.1μs -> 23.3μs (24.6% faster)


def test_edge_mismatched_dimensions():
    # X and Y with different number of columns should raise ValueError
    X = [[1, 2, 3]]
    Y = [[1, 2]]
    with pytest.raises(ValueError):
        cosine_similarity_top_k(X, Y)  # 3.79μs -> 3.88μs (2.14% slower)


def test_edge_negative_values():
    # Vectors with negative values, cosine similarity is valid
    X = [[-1, -1]]
    Y = [[1, 1]]
    idxs, scores = cosine_similarity_top_k(X, Y)  # 24.4μs -> 23.2μs (4.84% faster)


def test_edge_score_threshold_all_filtered():
    # All scores below threshold, should return empty
    X = [[1, 0]]
    Y = [[0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, score_threshold=0.1
    )  # 22.9μs -> 21.9μs (4.37% faster)


def test_edge_top_k_none():
    # top_k=None should return all pairs
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=None
    )  # 26.3μs -> 24.7μs (6.76% faster)


def test_edge_score_threshold_none():
    # score_threshold=None should not filter any scores
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, score_threshold=None
    )  # 25.8μs -> 24.2μs (6.37% faster)


def test_edge_non_list_input():
    # Accepts np.ndarray as input
    X = np.array([[1, 0], [0, 1]])
    Y = np.array([[1, 0], [0, 1]])
    idxs, scores = cosine_similarity_top_k(X, Y)  # 24.6μs -> 23.9μs (2.79% faster)


def test_edge_single_element_vectors():
    # Vectors with single element
    X = [[2]]
    Y = [[3]]
    idxs, scores = cosine_similarity_top_k(X, Y)  # 24.0μs -> 22.7μs (5.70% faster)


def test_edge_score_threshold_zero():
    # score_threshold=0 should filter out negative similarities
    X = [[1, 0], [-1, 0]]
    Y = [[1, 0]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, score_threshold=0
    )  # 25.3μs -> 23.7μs (7.04% faster)


def test_edge_top_k_zero():
    # top_k=0 should return empty result
    X = [[1, 0], [0, 1]]
    Y = [[1, 0], [0, 1]]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=0
    )  # 25.8μs -> 24.2μs (6.53% faster)


# -------------------- LARGE SCALE TEST CASES --------------------


def test_large_scale_top_k():
    # Large matrices, top_k selection
    X = [[i for i in range(10)] for _ in range(100)]
    Y = [[i for i in range(10)] for _ in range(100)]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=10
    )  # 389μs -> 307μs (26.7% faster)


def test_large_scale_score_threshold():
    # Large matrices, with score threshold
    X = [[i for i in range(10)] for _ in range(100)]
    Y = [[i for i in range(10)] for _ in range(100)]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=100, score_threshold=0.99
    )  # 407μs -> 320μs (27.1% faster)


def test_large_scale_different_vectors():
    # Large matrices, vectors are different
    X = [[i for i in range(10)] for _ in range(100)]
    Y = [[i + 1 for i in range(10)] for _ in range(100)]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=10
    )  # 388μs -> 303μs (28.0% faster)


def test_large_scale_all_pairs():
    # Large matrices, top_k=None returns all pairs
    X = [[i for i in range(10)] for _ in range(20)]
    Y = [[i for i in range(10)] for _ in range(30)]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=None
    )  # 177μs -> 170μs (4.17% faster)


def test_large_scale_performance():
    # Test function does not hang or crash for large input (under 1000 elements)
    X = [[i for i in range(10)] for _ in range(50)]
    Y = [[i for i in range(10)] for _ in range(50)]
    idxs, scores = cosine_similarity_top_k(
        X, Y, top_k=100
    )  # 154μs -> 133μs (15.8% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from src.statistics.similarity import cosine_similarity_top_k


def test_cosine_similarity_top_k():
    cosine_similarity_top_k([[]], [[]], top_k=0, score_threshold=0.0)


def test_cosine_similarity_top_k_2():
    cosine_similarity_top_k([[]], [], top_k=0, score_threshold=0.0)
🔎 Click to see Concolic Coverage Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_ep6oyi9w/tmpwu009xfa/test_concolic_coverage.py::test_cosine_similarity_top_k 26.8μs 22.8μs 17.6%✅
codeflash_concolic_ep6oyi9w/tmpwu009xfa/test_concolic_coverage.py::test_cosine_similarity_top_k_2 500ns 500ns 0.000%✅

To edit these changes git checkout codeflash/optimize-cosine_similarity_top_k-mjhwfxfp and push.

Codeflash Static Badge

The optimized code achieves a **13% speedup** through several key optimizations in the `cosine_similarity` function:

**Key Optimizations:**

1. **More efficient array conversion**: Uses `np.asarray(X, dtype=np.float64)` instead of `np.array(X)`. This avoids unnecessary copies when the input is already a numpy array and ensures consistent float64 precision.

2. **Broadcasting optimization**: Adds `keepdims=True` to norm calculations, allowing `X_norm @ Y_norm.T` instead of the more expensive `np.outer(X_norm, Y_norm)`. This reduces memory allocation and leverages optimized matrix multiplication.

3. **Improved NaN/Inf handling**: Replaces the boolean indexing approach with `np.copyto(..., where=~np.isfinite(...))` and `np.errstate` context manager, which is more efficient for in-place operations.

4. **Minor variable caching**: Stores `flat_scores = score_array.flatten()` to avoid repeated flatten operations.

**Performance Impact by Test Case:**

- **Zero/sparse vectors see largest gains** (24-30% faster): The optimized NaN/Inf handling is particularly effective when dealing with zero vectors that produce division by zero.
- **Regular computation cases** show consistent 4-8% improvements across various matrix sizes and configurations.
- **Large-scale tests** (100+ vectors) benefit significantly (15-28% faster) due to the more efficient matrix operations.

**Why It's Faster:**

The `np.outer` operation in the original creates a full matrix multiplication, while the optimized version uses broadcasting with `@` operator which is more cache-friendly and leverages BLAS optimizations. The `keepdims=True` eliminates the need for reshaping operations, and `np.asarray` with explicit dtype avoids potential type inference overhead.

The optimization maintains identical output behavior while being particularly effective for workloads involving similarity computations on larger datasets or scenarios with many zero vectors.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 December 23, 2025 01:21
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 23, 2025
@KRRT7 KRRT7 closed this Dec 23, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-cosine_similarity_top_k-mjhwfxfp branch December 23, 2025 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants