⚡️ Speed up function cosine_similarity by 45%
#212
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 45% (0.45x) speedup for
cosine_similarityinsrc/statistics/similarity.py⏱️ Runtime :
15.7 milliseconds→10.8 milliseconds(best of250runs)📝 Explanation and details
Key optimizations and reasoning:
np.outer(X_norm, Y_norm)and divide.np.outer()creates a potentially huge intermediate array. Instead, normalize each row and perform a straight dot product.np.asarrayinstead ofnp.arrayfor conversion:float64) for safety:np.linalg.normandnp.dotimproves BLAS speed and avoids subtle bugs from mixed dtypes.np.nan_to_numover manual masking:This version will exhibit faster runtime and notably reduced RAM use for large input matrices, without any loss of behavioral fidelity or code clarity.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
🔎 Click to see Concolic Coverage Tests
codeflash_concolic_78m9hvjn/tmp22rzhr5y/test_concolic_coverage.py::test_cosine_similaritycodeflash_concolic_78m9hvjn/tmp22rzhr5y/test_concolic_coverage.py::test_cosine_similarity_2codeflash_concolic_78m9hvjn/tmp22rzhr5y/test_concolic_coverage.py::test_cosine_similarity_3To edit these changes
git checkout codeflash/optimize-cosine_similarity-mji2w0a3and push.