Commit bdcbf80
authored
Adapt fp32 sq8 dist functions ip cosine [MOD-13392] (#882)
* Add SQ8-to-SQ8 distance functions and optimizations
- Implemented inner product and cosine distance functions for SQ8-to-SQ8 vectors in SVE, NEON, and AVX512 architectures.
- Added corresponding distance function selection logic in IP_space.cpp and function headers in IP_space.h.
- Created benchmarks for SQ8-to-SQ8 distance functions to evaluate performance across different architectures.
- Developed unit tests to validate the correctness of the new distance functions against expected results.
- Ensured compatibility with existing optimization features for various CPU architectures.
* Add SQ8-to-SQ8 benchmark tests and update related scripts
* Format
* Orgnizing
* Add full sq8 bencharks
* Optimize the sq8 sq8
* Optimize SQ8 distance functions for NEON by reducing operations and improving performance
* format
* Add NEON DOTPROD-optimized distance functions for SQ8-to-SQ8 calculations
* PR
* Remove NEON DOTPROD-optimized distance functions for INT8, UINT8, and SQ8-to-SQ8 calculations
* Fix vector layout documentation by removing inv_norm from comments in NEON and AVX512 headers
* Remove 'constexpr' from ones vector declaration in NEON inner product function
* Add SQ8-to-SQ8 L2 squared distance functions with SIMD optimizations
- Implemented NEON, SVE, and AVX512F optimized functions for calculating L2 squared distance between SQ8 (scalar quantized 8-bit) vectors.
- Introduced helper functions for processing vector elements using NEON and SVE intrinsics.
- Updated L2_space.cpp and L2_space.h to include new distance function for SQ8-to-SQ8.
- Enhanced AVX512F, NEON, and SVE function selectors to choose the appropriate implementation based on CPU features.
- Added unit tests to validate the correctness of the new L2 squared distance functions.
- Updated benchmark tests to include performance measurements for the new implementations.
* Change the name
* Add full range tests for SQ8 distance functions with SIMD optimizations
* Refactor distance functions to remove inv_norm parameter and update documentation accordingly
* Update SQ8 Cosine test to normalize both input vectors and adjust distance assertion tolerance
* Rename 'compressed' to 'quantized' in SQ8 functions for clarity and consistency
* Rename 'compressed' to 'quantized' in SQ8 distance tests for clarity
* Refactor quantization function to remove unused normalization calculations
* Add TODO to store vector's norm and sum in L2 squared distance calculation
* Implement SQ8-to-SQ8 distance functions with precomputed sum and norm using AVX512 VNNI; add benchmarks and tests for new functionality
* Add edge case tests for SQ8-to-SQ8 precomputed cosine distance functions
* Refactor SQ8 test cases to use CreateSQ8QuantizedVector for vector population
* Implement SQ8-to-SQ8 precomputed distance functions using ARM NEON, SVE, and AVX512; add corresponding selection functions and update tests for consistency.
* Implement SQ8-to-SQ8 precomputed inner product and cosine functions; update benchmarks and tests for new functionality
* Refactor SQ8 distance functions and remove precomputed variants
- Updated distance function declarations in IP_space.h to clarify that SQ8-to-SQ8 functions use precomputed sum/norm.
- Removed precomputed distance function implementations for AVX512F, NEON, and SVE architectures from their respective source files.
- Adjusted benchmark tests to remove references to precomputed distance functions and ensure they utilize the updated quantization methods.
- Modified utility functions to support the creation of SQ8 quantized vectors with precomputed sum and norm.
- Updated unit tests to reflect changes in the quantization process and removed tests specifically for precomputed distance functions.
* Refactor SQ8 distance functions and tests for improved clarity and consistency
- Updated include paths in AVX512F_BW_VL_VNNI.cpp to reflect new naming conventions.
- Modified unit tests in test_spaces.cpp to streamline vector initialization and quantization processes.
- Replaced repetitive code with utility functions for populating and quantizing vectors.
- Enhanced assertions in tests to ensure optimized distance functions are correctly chosen and validated.
- Removed unnecessary parameters from utility functions to simplify their interfaces.
- Improved test coverage for edge cases, including zero and constant vectors, ensuring accuracy across various scenarios.
* Refactor SQ8 benchmarks by removing precomputed variants and updating vector population methods
* foramt
* Remove serialization benchmark script for HNSW disk serialization
* Refactor SQ8 distance functions and tests to remove precomputed norm references
* format
* Refactor SQ8 distance tests to use compressed vectors and improve normalization calculations
* Update vector layout documentation to reflect removal of sum of squares in SQ8 implementations
* Refactor L2 SQ8 distance computation to remove unused accumulators and streamline calculations
* Refactor SQ8 distance functions to remove norm computation
- Updated comments and documentation to reflect that the SQ8-to-SQ8 distance functions now only utilize precomputed sums, removing references to norms.
- Modified function signatures and implementations across various SIMD architectures (AVX512F, NEON, SVE) to align with the new approach.
- Adjusted utility functions for populating SQ8 vectors to include metadata for sums and normalization.
- Updated unit tests and benchmarks to ensure compatibility with the new SQ8 vector population methods and to validate the correctness of distance calculations.
* Update SQ8-to-SQ8 distance function comment to remove norm reference
* Refactor cosine similarity functions to remove unnecessary subtraction in AVX2, SSE4, and SVE implementations
* Refactor L2 SQ8 distance functions to eliminate unused accumulators and streamline calculations
* Refactor SQ8 L2 and IP implementations to use common inner product function
- Introduced SQ8_SQ8_InnerProduct_Impl for shared inner product calculations in SQ8 space.
- Updated SQ8_SQ8_L2Sqr to utilize the new inner product implementation, improving performance and reducing code duplication.
- Modified AVX512 and NEON SIMD implementations to leverage the common inner product function for L2 squared distance calculations.
- Removed redundant code and tests related to full range vector comparisons, streamlining the test suite.
- Ensured that vector layouts include sum of squares for optimized distance calculations.
* Refactor cosine similarity functions to use specific SIMD implementations for improved clarity and performance
* Refactor L2 distance functions for SQ8 vectors to utilize common inner product implementation and update metadata extraction in tests
* Refactor benchmark setup to allocate additional space for sum and sum_squares in SQ8 vector tests
* Add CPU feature checks to disable optimizations for AArch64 in SQ8 distance function
* Add CPU feature checks to disable optimizations for AArch64 in SQ8 distance function tests
* Fix formatting issues in SQ8 inner product function and clean up conditional compilation in tests
* Refactor SQ8 distance functions and tests for improved readability and consistency
* Refactor SQ8 L2Sqr tests to use quantized vectors and improve alignment checks
* Enhance SQ8 Inner Product Implementations with Optimized Dot Product Calculations
- Refactored inner product calculations for SQ8 vectors using NEON and SVE optimizations.
- Integrated UINT8_InnerProductImp for efficient dot product computation in NEON and SVE implementations.
- Updated inner product functions to handle 64-element chunks for improved performance.
- Adjusted distance function selection logic to ensure optimizations are applied only for dimensions >= 16.
- Added tests for zero vectors and constant vectors to validate optimized implementations against baseline results.
- Ensured consistency in assertions for symmetry tests across various optimization flags.
- Improved code readability and maintainability by removing redundant code and comments.
* Fix header guard duplication and update test assertion for floating-point comparison
* Add missing pragma once directive in NEON header files
* Refactor SQ8 distance functions for improved performance and clarity
- Updated inner product functions for NEON, SSE4, and SVE to streamline dequantization and reduce unnecessary calculations.
- Consolidated common logic for inner product and cosine calculations across different SIMD implementations.
- Enhanced the handling of vector normalization and quantization in unit tests, ensuring consistency in compressed vector sizes.
- Adjusted benchmark tests to reflect changes in vector compression and distance function calls.
- Corrected include paths for AVX512 implementations to maintain consistency across the codebase.
* Update SQ8 vector population functions to include metadata and adjust compressed size calculations
* Refactor SQ8 inner product functions for improved clarity and performance
* Refactor L2 distance functions to utilize common inner product implementations for improved clarity and performance
* Rename inner product implementation functions for AVX2 and AVX512 for clarity
* Refactor SQ8 cosine function to utilize inner product function for improved clarity
* Remove redundant inner product edge case tests for SQ8 distance functions
* Add SVE2 support to SQ8-to-SQ8 Inner Product distance function
* Fix SQ8_Cosine to call the correct inner product function for improved accuracy
* Remove SVE2 and other optimizations from SQ8 cosine function test for ARM architecture
* Add L2 distance function without optimizations for testing purposes
* Refactor L2 distance function and update test assertions for precision
* Update L2 squared distance functions to support 64 residuals in NEON implementation
* Refactor L2 distance function conditions for NEON optimizations
* Adjust NEON_DOTPROD benchmark initialization to use a dimension of 16
* Update NEON benchmarks to support 64 dimensions for L2 and Cosine metrics
* Optimize SQ8 Inner Product Implementation
- Refactor the SQ8 inner product computation to eliminate unnecessary dequantization steps, improving performance.
- Introduce a new helper function `InnerProductStepSQ8` that computes the inner product directly using quantized values.
- Update the main inner product function `SQ8_InnerProductSIMD_SVE_IMP` to utilize the new helper function, streamlining the computation process.
- Modify the test suite to validate the new implementation, ensuring correctness against the baseline non-optimized version.
- Add edge case tests for self-distance, symmetry, zero vectors, constant vectors, and extreme values to ensure robustness of the SQ8 cosine distance function.
- Introduce utility functions for preprocessing and populating SQ8 queries, enhancing test clarity and maintainability.
* Refactor SQ8 inner product functions to clarify FMA usage and improve performance
* Update SQ8 test cases to improve alignment checks and adjust quantized size calculations
* Add optimized SQ8 inner product implementation and update test cases
* Fix pointer usage in SQ8 inner product implementation to reference original vectors
* Add sq8 type definition and update inner product implementations for quantization parameters
* Refactor SQ8 inner product implementations to use structured quantization parameters and clean up code formatting
* Fix SQ8 EdgeCases test by adjusting vector size for constant vector test
* Fix formatting in SQ8_EdgeCases test by adjusting vector initialization
* Refactor SQ8 inner product implementations to use precomputed y_sum from query blob
* Fix formatting in SQ8_EdgeCases test for better readability
* Refactor SQ8 cosine distance calculation to use optimized function1 parent f6df960 commit bdcbf80
File tree
12 files changed
+887
-526
lines changed- src/VecSim
- spaces
- IP
- functions
- types
- tests
- unit
- utils
12 files changed
+887
-526
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
40 | 66 | | |
41 | 67 | | |
42 | 68 | | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
| 69 | + | |
52 | 70 | | |
53 | 71 | | |
54 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
| 13 | + | |
11 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
12 | 30 | | |
13 | | - | |
14 | | - | |
15 | | - | |
| 31 | + | |
| 32 | + | |
16 | 33 | | |
17 | 34 | | |
18 | 35 | | |
19 | | - | |
20 | | - | |
| 36 | + | |
| 37 | + | |
21 | 38 | | |
22 | 39 | | |
23 | | - | |
24 | 40 | | |
25 | | - | |
26 | | - | |
27 | 41 | | |
28 | 42 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
| 43 | + | |
| 44 | + | |
39 | 45 | | |
40 | 46 | | |
41 | 47 | | |
42 | 48 | | |
43 | 49 | | |
44 | | - | |
45 | 50 | | |
46 | 51 | | |
47 | 52 | | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
| 53 | + | |
55 | 54 | | |
56 | 55 | | |
57 | | - | |
58 | | - | |
| 56 | + | |
59 | 57 | | |
60 | 58 | | |
61 | 59 | | |
62 | 60 | | |
63 | 61 | | |
64 | | - | |
65 | | - | |
| 62 | + | |
| 63 | + | |
66 | 64 | | |
67 | 65 | | |
68 | | - | |
69 | 66 | | |
70 | | - | |
71 | | - | |
72 | 67 | | |
73 | 68 | | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
| 69 | + | |
| 70 | + | |
79 | 71 | | |
80 | 72 | | |
81 | | - | |
| 73 | + | |
82 | 74 | | |
83 | | - | |
| 75 | + | |
84 | 76 | | |
85 | 77 | | |
86 | | - | |
87 | | - | |
| 78 | + | |
| 79 | + | |
88 | 80 | | |
89 | | - | |
90 | | - | |
| 81 | + | |
| 82 | + | |
91 | 83 | | |
92 | 84 | | |
93 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
94 | 99 | | |
95 | 100 | | |
96 | 101 | | |
| |||
100 | 105 | | |
101 | 106 | | |
102 | 107 | | |
103 | | - | |
104 | | - | |
105 | | - | |
| 108 | + | |
| 109 | + | |
106 | 110 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
11 | 13 | | |
12 | | - | |
13 | | - | |
14 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
15 | 32 | | |
16 | 33 | | |
17 | 34 | | |
18 | | - | |
19 | | - | |
| 35 | + | |
| 36 | + | |
20 | 37 | | |
21 | 38 | | |
22 | | - | |
23 | 39 | | |
24 | | - | |
25 | | - | |
26 | 40 | | |
27 | 41 | | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
33 | 45 | | |
34 | 46 | | |
35 | 47 | | |
36 | 48 | | |
37 | 49 | | |
38 | | - | |
39 | 50 | | |
40 | 51 | | |
41 | 52 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
| 53 | + | |
49 | 54 | | |
50 | 55 | | |
51 | | - | |
52 | | - | |
| 56 | + | |
53 | 57 | | |
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
58 | | - | |
59 | | - | |
| 62 | + | |
| 63 | + | |
60 | 64 | | |
61 | 65 | | |
62 | | - | |
63 | 66 | | |
64 | | - | |
65 | | - | |
66 | 67 | | |
67 | 68 | | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
| 69 | + | |
| 70 | + | |
73 | 71 | | |
74 | 72 | | |
75 | | - | |
| 73 | + | |
76 | 74 | | |
77 | | - | |
| 75 | + | |
78 | 76 | | |
79 | 77 | | |
80 | | - | |
81 | | - | |
| 78 | + | |
| 79 | + | |
82 | 80 | | |
83 | | - | |
84 | | - | |
| 81 | + | |
| 82 | + | |
85 | 83 | | |
86 | 84 | | |
87 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
88 | 99 | | |
89 | 100 | | |
90 | 101 | | |
| |||
95 | 106 | | |
96 | 107 | | |
97 | 108 | | |
98 | | - | |
99 | | - | |
| 109 | + | |
100 | 110 | | |
0 commit comments