Sq8 to Sq8 dist functions - ip and cosine [MOD-13170]#873
Conversation
- Implemented inner product and cosine distance functions for SQ8-to-SQ8 vectors in SVE, NEON, and AVX512 architectures. - Added corresponding distance function selection logic in IP_space.cpp and function headers in IP_space.h. - Created benchmarks for SQ8-to-SQ8 distance functions to evaluate performance across different architectures. - Developed unit tests to validate the correctness of the new distance functions against expected results. - Ensured compatibility with existing optimization features for various CPU architectures.
There was a problem hiding this comment.
Pull request overview
This PR adds support for SQ8-to-SQ8 distance functions for Inner Product and Cosine similarity metrics, where both vectors are scalar-quantized 8-bit representations. This extends the existing SQ8 functionality which only handled float-to-SQ8 comparisons.
Key changes:
- Implemented optimized SIMD versions for Intel (AVX512) and ARM (SVE, SVE2, NEON) architectures
- Added comprehensive test coverage for all optimization variants
- Integrated new benchmark suite for performance testing
- Refactored existing helper function names to avoid naming conflicts
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_spaces.cpp | Added unit tests for SQ8_SQ8 IP and Cosine functions across all architectures; improved test suite naming consistency |
| tests/benchmark/spaces_benchmarks/bm_spaces_sq8_sq8.cpp | New benchmark file for SQ8_SQ8 distance functions |
| tests/benchmark/spaces_benchmarks/bm_spaces_sq8.cpp | Removed empty comment lines (cleanup) |
| tests/benchmark/benchmarks.sh | Added spaces_sq8_sq8 to benchmark configurations |
| tests/benchmark/CMakeLists.txt | Added sq8_sq8 data type to build system |
| src/VecSim/spaces/functions/SVE2.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/SVE2.cpp | Implemented SVE2 function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/functions/SVE.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/SVE.cpp | Implemented SVE function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/functions/NEON.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/NEON.cpp | Implemented NEON function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.cpp | Implemented AVX512 function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/IP_space.h | Added public API declarations for SQ8_SQ8 distance functions |
| src/VecSim/spaces/IP_space.cpp | Implemented dispatcher functions that select appropriate SIMD implementation |
| src/VecSim/spaces/IP/IP_SVE_SQ8_SQ8.h | New file with SVE-optimized SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/IP/IP_NEON_SQ8_SQ8.h | New file with NEON-optimized SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/IP/IP_AVX512F_SQ8_SQ8_BW_VL_VNNI.h | New file with AVX512-optimized SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/IP/IP_AVX512F_SQ8_BW_VL_VNNI.h | Renamed helper function to avoid naming conflicts with new SQ8_SQ8 implementations |
| src/VecSim/spaces/IP/IP_AVX2_SQ8.h | Renamed helper function to avoid naming conflicts and fixed cosine normalization |
| src/VecSim/spaces/IP/IP.h | Added function declarations for SQ8_SQ8 base implementations |
| src/VecSim/spaces/IP/IP.cpp | Implemented baseline SQ8_SQ8 InnerProduct and Cosine functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #873 +/- ##
==========================================
- Coverage 97.03% 97.02% -0.01%
==========================================
Files 126 127 +1
Lines 7455 7506 +51
==========================================
+ Hits 7234 7283 +49
- Misses 221 223 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… SQ8-to-SQ8 calculations
… NEON and AVX512 headers
…ocumentation accordingly
…tance assertion tolerance
… using AVX512 VNNI; add benchmarks and tests for new functionality
…VE, and AVX512; add corresponding selection functions and update tests for consistency.
…update benchmarks and tests for new functionality
…stance function tests
…itional compilation in tests
meiravgri
left a comment
There was a problem hiding this comment.
-
Please revert changes in files that are not related to this pr (all IP_*_SQ8.h files changes)
-
consider comparing IP_*UINT8.h optimized implmntation function to the new SQ8_SQ8 to ensure we get the best optimization
tests/utils/tests_utils.h
Outdated
| const auto *pVect1 = static_cast<const uint8_t *>(pVect1v); | ||
| const auto *pVect2 = static_cast<const uint8_t *>(pVect2v); | ||
|
|
||
| // Extract metadata from the end of vectors (likely already prefetched) |
There was a problem hiding this comment.
what does it mean "(likely already prefetched)"?
tests/unit/test_spaces.cpp
Outdated
| float baseline = SQ8_Cosine(v1_orig.data(), v2_quantized.data(), dim); | ||
|
|
||
| #ifdef OPT_SVE2 | ||
| if (optimization.sve2) { | ||
| unsigned char alignment = 0; | ||
| arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization); | ||
| ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_SVE2(dim)) | ||
| << "Unexpected distance function chosen for dim " << dim; | ||
| ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01) | ||
| << "SVE2 with dim " << dim; | ||
| optimization.sve2 = 0; | ||
| } | ||
| #endif | ||
| #ifdef OPT_SVE | ||
| if (optimization.sve) { | ||
| unsigned char alignment = 0; | ||
| arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization); | ||
| ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_SVE(dim)) | ||
| << "Unexpected distance function chosen for dim " << dim; | ||
| ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01) | ||
| << "SVE with dim " << dim; | ||
| optimization.sve = 0; | ||
| } | ||
| #endif | ||
| #ifdef OPT_NEON | ||
| if (optimization.asimd) { | ||
| unsigned char alignment = 0; | ||
| arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization); | ||
| ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_NEON(dim)) | ||
| << "Unexpected distance function chosen for dim " << dim; | ||
| ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01) | ||
| << "NEON with dim " << dim; | ||
| optimization.asimd = 0; | ||
| } | ||
| #endif | ||
|
|
There was a problem hiding this comment.
can we avoid this (and all the following) change that is not related to sq8_sq8?
There was a problem hiding this comment.
I think the PR should also handle the fixes needed with regular SQ8, as it was also converted to be used by the disk implementation
There was a problem hiding this comment.
we need to implement them for sure, but in a different pr
…Calculations - Refactored inner product calculations for SQ8 vectors using NEON and SVE optimizations. - Integrated UINT8_InnerProductImp for efficient dot product computation in NEON and SVE implementations. - Updated inner product functions to handle 64-element chunks for improved performance. - Adjusted distance function selection logic to ensure optimizations are applied only for dimensions >= 16. - Added tests for zero vectors and constant vectors to validate optimized implementations against baseline results. - Ensured consistency in assertions for symmetry tests across various optimization flags. - Improved code readability and maintainability by removing redundant code and comments.
- Updated inner product functions for NEON, SSE4, and SVE to streamline dequantization and reduce unnecessary calculations. - Consolidated common logic for inner product and cosine calculations across different SIMD implementations. - Enhanced the handling of vector normalization and quantization in unit tests, ensuring consistency in compressed vector sizes. - Adjusted benchmark tests to reflect changes in vector compression and distance function calls. - Corrected include paths for AVX512 implementations to maintain consistency across the codebase.
… compressed size calculations
| float SQ8_SQ8_InnerProductImp(const void *pVec1v, const void *pVec2v, size_t dimension) { | ||
| // Compute raw dot product using efficient UINT8 AVX512 VNNI implementation | ||
| // UINT8_InnerProductImp uses _mm512_dpwssd_epi32 for native integer dot product | ||
| int dot_product = UINT8_InnerProductImp<residual>(pVec1v, pVec2v, dimension); |
| // NEON SQ8-to-SQ8 functions | ||
| #ifdef OPT_NEON | ||
| bool neon_supported = opt.asimd; | ||
| INITIALIZE_BENCHMARKS_SET_IP(BM_VecSimSpaces_SQ8_SQ8, SQ8_SQ8, NEON, 16, neon_supported); |
There was a problem hiding this comment.
Is there a reason that the dim_opt is not aligned with bm_spaces_uint.cpp values?
There was a problem hiding this comment.
No, just forgot to change it.
Describe the changes in the pull request
Added support to SQ8 to SQ8 Ip and Cosine spaces.
Intel:
AVX512_F_BW_VL_VNNI - needs it to support uint8 and in8 operations.
ARM:
SVE
NEON
The cosine functions assume that the vectors are normlized therefore don't divide by the norm.
Which issues this PR fixes
Main objects this PR modified
Mark if applicable