Skip to content

Commit 6ab7027

Browse files
authored
Update _posts/2025-03-27-Boost-OpenSearch-VectorSearch-Performance-With-Intel-AVX512.md
Signed-off-by: Nathan Bower <[email protected]>
1 parent 5d3739f commit 6ab7027

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

_posts/2025-03-27-Boost-OpenSearch-VectorSearch-Performance-With-Intel-AVX512.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ The techniques used in vector search are computationally expensive, and Intel AV
3333
- For writing native code using intrinsics
3434
- In compiler optimizations, such as auto-vectorization
3535
The corresponding optimized assembly instructions are generated when the accelerator is correctly utilized. AVX2 generates instructions using YMM registers, and AVX-512 generates instructions using ZMM registers. Performance is enhanced by allowing ZMM registers to handle 32 double-precision and 64 single-precision floating-point operations per clock cycle within 512-bit vectors. Additionally, these registers can process eight 64-bit and sixteen 32-bit integers. With up to two 512-bit fused multiply-add (FMA) units, AVX-512 effectively doubles the width of data registers, the number of registers, and the width of FMA units compared to Intel AVX2 YMM registers. Beyond these improvements, Intel AVX-512 offers increased parallelism, which leads to faster data processing and improved performance in compute-intensive applications such as scientific simulations, analytics, and machine learning. It also provides enhanced support for complex number calculations and accelerates tasks like cryptography and data compression. Furthermore, AVX-512 includes new instructions that improve the efficiency of certain algorithms, reduce power consumption, and optimize resource utilization, making it a powerful tool for modern computing needs.
36+
3637
By registering double width to 512 bits, the use of ZMM registers instead of YMM registers can potentially double the data throughput and computational power. When the AVX-512 extension is detected, the Faiss distance and scalar quantizer functions process 16 vectors per loop compared to 8 vectors per loop for the AVX2 extension.
3738
Thus, in vector search using k-nearest neighbors (k-NN), index build times and vector search performance can be enhanced with the use of these new hardware extensions.
3839

0 commit comments

Comments
 (0)