Commit 32d0279
authored
Fp32 sq8 dist functions L2Sqr [MOD-13392] (#885)
* Add SQ8-to-SQ8 distance functions and optimizations
- Implemented inner product and cosine distance functions for SQ8-to-SQ8 vectors in SVE, NEON, and AVX512 architectures.
- Added corresponding distance function selection logic in IP_space.cpp and function headers in IP_space.h.
- Created benchmarks for SQ8-to-SQ8 distance functions to evaluate performance across different architectures.
- Developed unit tests to validate the correctness of the new distance functions against expected results.
- Ensured compatibility with existing optimization features for various CPU architectures.
* Add SQ8-to-SQ8 benchmark tests and update related scripts
* Format
* Orgnizing
* Add full sq8 bencharks
* Optimize the sq8 sq8
* Optimize SQ8 distance functions for NEON by reducing operations and improving performance
* format
* Add NEON DOTPROD-optimized distance functions for SQ8-to-SQ8 calculations
* PR
* Remove NEON DOTPROD-optimized distance functions for INT8, UINT8, and SQ8-to-SQ8 calculations
* Fix vector layout documentation by removing inv_norm from comments in NEON and AVX512 headers
* Remove 'constexpr' from ones vector declaration in NEON inner product function
* Add SQ8-to-SQ8 L2 squared distance functions with SIMD optimizations
- Implemented NEON, SVE, and AVX512F optimized functions for calculating L2 squared distance between SQ8 (scalar quantized 8-bit) vectors.
- Introduced helper functions for processing vector elements using NEON and SVE intrinsics.
- Updated L2_space.cpp and L2_space.h to include new distance function for SQ8-to-SQ8.
- Enhanced AVX512F, NEON, and SVE function selectors to choose the appropriate implementation based on CPU features.
- Added unit tests to validate the correctness of the new L2 squared distance functions.
- Updated benchmark tests to include performance measurements for the new implementations.
* Change the name
* Add full range tests for SQ8 distance functions with SIMD optimizations
* Refactor distance functions to remove inv_norm parameter and update documentation accordingly
* Update SQ8 Cosine test to normalize both input vectors and adjust distance assertion tolerance
* Rename 'compressed' to 'quantized' in SQ8 functions for clarity and consistency
* Rename 'compressed' to 'quantized' in SQ8 distance tests for clarity
* Refactor quantization function to remove unused normalization calculations
* Add TODO to store vector's norm and sum in L2 squared distance calculation
* Implement SQ8-to-SQ8 distance functions with precomputed sum and norm using AVX512 VNNI; add benchmarks and tests for new functionality
* Add edge case tests for SQ8-to-SQ8 precomputed cosine distance functions
* Refactor SQ8 test cases to use CreateSQ8QuantizedVector for vector population
* Implement SQ8-to-SQ8 precomputed distance functions using ARM NEON, SVE, and AVX512; add corresponding selection functions and update tests for consistency.
* Implement SQ8-to-SQ8 precomputed inner product and cosine functions; update benchmarks and tests for new functionality
* Refactor SQ8 distance functions and remove precomputed variants
- Updated distance function declarations in IP_space.h to clarify that SQ8-to-SQ8 functions use precomputed sum/norm.
- Removed precomputed distance function implementations for AVX512F, NEON, and SVE architectures from their respective source files.
- Adjusted benchmark tests to remove references to precomputed distance functions and ensure they utilize the updated quantization methods.
- Modified utility functions to support the creation of SQ8 quantized vectors with precomputed sum and norm.
- Updated unit tests to reflect changes in the quantization process and removed tests specifically for precomputed distance functions.
* Refactor SQ8 distance functions and tests for improved clarity and consistency
- Updated include paths in AVX512F_BW_VL_VNNI.cpp to reflect new naming conventions.
- Modified unit tests in test_spaces.cpp to streamline vector initialization and quantization processes.
- Replaced repetitive code with utility functions for populating and quantizing vectors.
- Enhanced assertions in tests to ensure optimized distance functions are correctly chosen and validated.
- Removed unnecessary parameters from utility functions to simplify their interfaces.
- Improved test coverage for edge cases, including zero and constant vectors, ensuring accuracy across various scenarios.
* Refactor SQ8 benchmarks by removing precomputed variants and updating vector population methods
* foramt
* Remove serialization benchmark script for HNSW disk serialization
* Refactor SQ8 distance functions and tests to remove precomputed norm references
* format
* Refactor SQ8 distance tests to use compressed vectors and improve normalization calculations
* Update vector layout documentation to reflect removal of sum of squares in SQ8 implementations
* Refactor L2 SQ8 distance computation to remove unused accumulators and streamline calculations
* Refactor SQ8 distance functions to remove norm computation
- Updated comments and documentation to reflect that the SQ8-to-SQ8 distance functions now only utilize precomputed sums, removing references to norms.
- Modified function signatures and implementations across various SIMD architectures (AVX512F, NEON, SVE) to align with the new approach.
- Adjusted utility functions for populating SQ8 vectors to include metadata for sums and normalization.
- Updated unit tests and benchmarks to ensure compatibility with the new SQ8 vector population methods and to validate the correctness of distance calculations.
* Update SQ8-to-SQ8 distance function comment to remove norm reference
* Refactor cosine similarity functions to remove unnecessary subtraction in AVX2, SSE4, and SVE implementations
* Refactor L2 SQ8 distance functions to eliminate unused accumulators and streamline calculations
* Refactor SQ8 L2 and IP implementations to use common inner product function
- Introduced SQ8_SQ8_InnerProduct_Impl for shared inner product calculations in SQ8 space.
- Updated SQ8_SQ8_L2Sqr to utilize the new inner product implementation, improving performance and reducing code duplication.
- Modified AVX512 and NEON SIMD implementations to leverage the common inner product function for L2 squared distance calculations.
- Removed redundant code and tests related to full range vector comparisons, streamlining the test suite.
- Ensured that vector layouts include sum of squares for optimized distance calculations.
* Refactor cosine similarity functions to use specific SIMD implementations for improved clarity and performance
* Refactor L2 distance functions for SQ8 vectors to utilize common inner product implementation and update metadata extraction in tests
* Refactor benchmark setup to allocate additional space for sum and sum_squares in SQ8 vector tests
* Add CPU feature checks to disable optimizations for AArch64 in SQ8 distance function
* Add CPU feature checks to disable optimizations for AArch64 in SQ8 distance function tests
* Fix formatting issues in SQ8 inner product function and clean up conditional compilation in tests
* Refactor SQ8 distance functions and tests for improved readability and consistency
* Refactor SQ8 L2Sqr tests to use quantized vectors and improve alignment checks
* Enhance SQ8 Inner Product Implementations with Optimized Dot Product Calculations
- Refactored inner product calculations for SQ8 vectors using NEON and SVE optimizations.
- Integrated UINT8_InnerProductImp for efficient dot product computation in NEON and SVE implementations.
- Updated inner product functions to handle 64-element chunks for improved performance.
- Adjusted distance function selection logic to ensure optimizations are applied only for dimensions >= 16.
- Added tests for zero vectors and constant vectors to validate optimized implementations against baseline results.
- Ensured consistency in assertions for symmetry tests across various optimization flags.
- Improved code readability and maintainability by removing redundant code and comments.
* Fix header guard duplication and update test assertion for floating-point comparison
* Add missing pragma once directive in NEON header files
* Refactor SQ8 distance functions for improved performance and clarity
- Updated inner product functions for NEON, SSE4, and SVE to streamline dequantization and reduce unnecessary calculations.
- Consolidated common logic for inner product and cosine calculations across different SIMD implementations.
- Enhanced the handling of vector normalization and quantization in unit tests, ensuring consistency in compressed vector sizes.
- Adjusted benchmark tests to reflect changes in vector compression and distance function calls.
- Corrected include paths for AVX512 implementations to maintain consistency across the codebase.
* Update SQ8 vector population functions to include metadata and adjust compressed size calculations
* Refactor SQ8 inner product functions for improved clarity and performance
* Refactor L2 distance functions to utilize common inner product implementations for improved clarity and performance
* Rename inner product implementation functions for AVX2 and AVX512 for clarity
* Refactor SQ8 cosine function to utilize inner product function for improved clarity
* Remove redundant inner product edge case tests for SQ8 distance functions
* Add SVE2 support to SQ8-to-SQ8 Inner Product distance function
* Fix SQ8_Cosine to call the correct inner product function for improved accuracy
* Remove SVE2 and other optimizations from SQ8 cosine function test for ARM architecture
* Add L2 distance function without optimizations for testing purposes
* Refactor L2 distance function and update test assertions for precision
* Update L2 squared distance functions to support 64 residuals in NEON implementation
* Refactor L2 distance function conditions for NEON optimizations
* Adjust NEON_DOTPROD benchmark initialization to use a dimension of 16
* Update NEON benchmarks to support 64 dimensions for L2 and Cosine metrics
* Optimize SQ8 Inner Product Implementation
- Refactor the SQ8 inner product computation to eliminate unnecessary dequantization steps, improving performance.
- Introduce a new helper function `InnerProductStepSQ8` that computes the inner product directly using quantized values.
- Update the main inner product function `SQ8_InnerProductSIMD_SVE_IMP` to utilize the new helper function, streamlining the computation process.
- Modify the test suite to validate the new implementation, ensuring correctness against the baseline non-optimized version.
- Add edge case tests for self-distance, symmetry, zero vectors, constant vectors, and extreme values to ensure robustness of the SQ8 cosine distance function.
- Introduce utility functions for preprocessing and populating SQ8 queries, enhancing test clarity and maintainability.
* Refactor SQ8 inner product functions to clarify FMA usage and improve performance
* Update SQ8 test cases to improve alignment checks and adjust quantized size calculations
* Add optimized SQ8 inner product implementation and update test cases
* Fix pointer usage in SQ8 inner product implementation to reference original vectors
* Add sq8 type definition and update inner product implementations for quantization parameters
* Refactor SQ8 inner product implementations to use structured quantization parameters and clean up code formatting
* Fix SQ8 EdgeCases test by adjusting vector size for constant vector test
* Fix formatting in SQ8_EdgeCases test by adjusting vector initialization
* Refactor SQ8 inner product implementations to use precomputed y_sum from query blob
* Fix formatting in SQ8_EdgeCases test for better readability
* Refactor SQ8 cosine distance calculation to use optimized function
* Refactor SQ8 L2 squared distance calculations for optimized performance
- Implemented algebraic identity for L2 squared distance to avoid dequantization in hot loops across AVX2, AVX512, NEON, SSE4, SVE implementations.
- Updated L2 distance functions to utilize precomputed sum and sum of squares, improving efficiency.
- Modified unit tests to validate the new implementations and ensure consistency with previous non-optimized calculations.
- Enhanced test utilities to support preprocessing of float vectors for SQ8 L2 space.
* Fix formatting in IP.cpp and IP.h documentation for better readability
* Remove unused CreateSQ8CompressedVector helper function from test_spaces.cpp
* Add self-distance L2 test for SQ8 edge cases with optimization checks
* Refactor SQ8 query handling to unify preprocessing for IP/Cosine/L2 spaces and optimize memory allocation
* Fix query population seed in SQ8 benchmark for consistency1 parent f5e69ac commit 32d0279
File tree
13 files changed
+437
-701
lines changed- src/VecSim/spaces
- IP
- L2
- tests
- benchmark/spaces_benchmarks
- unit
- utils
13 files changed
+437
-701
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
28 | 32 | | |
29 | | - | |
30 | | - | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| |||
61 | 64 | | |
62 | 65 | | |
63 | 66 | | |
64 | | - | |
65 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
66 | 72 | | |
67 | 73 | | |
68 | 74 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
13 | 20 | | |
14 | 21 | | |
15 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
21 | 30 | | |
22 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
23 | 35 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 36 | + | |
| 37 | + | |
29 | 38 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
37 | 45 | | |
38 | 46 | | |
39 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | 1 | | |
3 | 2 | | |
4 | 3 | | |
| |||
7 | 6 | | |
8 | 7 | | |
9 | 8 | | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
| 13 | + | |
12 | 14 | | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
| 15 | + | |
34 | 16 | | |
35 | | - | |
36 | | - | |
37 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
38 | 29 | | |
39 | 30 | | |
40 | 31 | | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
| 32 | + | |
| 33 | + | |
65 | 34 | | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
85 | 39 | | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
| 40 | + | |
| 41 | + | |
92 | 42 | | |
93 | | - | |
| 43 | + | |
| 44 | + | |
94 | 45 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
| 13 | + | |
11 | 14 | | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
39 | 29 | | |
40 | 30 | | |
41 | 31 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 32 | + | |
| 33 | + | |
67 | 34 | | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
85 | 39 | | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
| 40 | + | |
| 41 | + | |
92 | 42 | | |
93 | | - | |
| 43 | + | |
| 44 | + | |
94 | 45 | | |
0 commit comments