Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
746bf31
Add SQ8-to-SQ8 distance functions and optimizations
dor-forer Dec 28, 2025
8697a3e
Add SQ8-to-SQ8 benchmark tests and update related scripts
dor-forer Dec 28, 2025
e0ce268
Format
dor-forer Dec 28, 2025
ab6b077
Orgnizing
dor-forer Dec 28, 2025
931e339
Add full sq8 bencharks
dor-forer Dec 28, 2025
a56474d
Optimize the sq8 sq8
dor-forer Dec 28, 2025
a25f45c
Optimize SQ8 distance functions for NEON by reducing operations and i…
dor-forer Dec 28, 2025
0ad941e
format
dor-forer Dec 28, 2025
68cd068
Add NEON DOTPROD-optimized distance functions for SQ8-to-SQ8 calculat…
dor-forer Dec 28, 2025
0b4b568
PR
dor-forer Dec 28, 2025
d0fd2e4
Remove NEON DOTPROD-optimized distance functions for INT8, UINT8, and…
dor-forer Dec 28, 2025
9de6163
Fix vector layout documentation by removing inv_norm from comments in…
dor-forer Dec 28, 2025
63a46a1
Remove 'constexpr' from ones vector declaration in NEON inner product…
dor-forer Dec 28, 2025
101aa69
Add SQ8-to-SQ8 L2 squared distance functions with SIMD optimizations
dor-forer Dec 28, 2025
5bef023
Change the name
dor-forer Dec 28, 2025
72053af
Add full range tests for SQ8 distance functions with SIMD optimizations
dor-forer Dec 29, 2025
525f8da
Refactor distance functions to remove inv_norm parameter and update d…
dor-forer Dec 29, 2025
13a477b
Update SQ8 Cosine test to normalize both input vectors and adjust dis…
dor-forer Dec 29, 2025
c18000e
Rename 'compressed' to 'quantized' in SQ8 functions for clarity and c…
dor-forer Dec 29, 2025
b58f8ef
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Dec 29, 2025
286990a
Rename 'compressed' to 'quantized' in SQ8 distance tests for clarity
dor-forer Dec 29, 2025
8cdc3fc
Refactor quantization function to remove unused normalization calcula…
dor-forer Dec 29, 2025
189290e
Add TODO to store vector's norm and sum in L2 squared distance calcul…
dor-forer Dec 29, 2025
bbf810e
Implement SQ8-to-SQ8 distance functions with precomputed sum and norm…
dor-forer Dec 29, 2025
dbbb7d9
Add edge case tests for SQ8-to-SQ8 precomputed cosine distance functions
dor-forer Dec 29, 2025
36ab068
Refactor SQ8 test cases to use CreateSQ8QuantizedVector for vector po…
dor-forer Dec 29, 2025
00617d7
Implement SQ8-to-SQ8 precomputed distance functions using ARM NEON, S…
dor-forer Dec 29, 2025
4331d91
Implement SQ8-to-SQ8 precomputed inner product and cosine functions; …
dor-forer Dec 29, 2025
2e7b30d
Refactor SQ8 distance functions and remove precomputed variants
dor-forer Dec 30, 2025
a111e36
Refactor SQ8 distance functions and tests for improved clarity and co…
dor-forer Dec 30, 2025
d510b8a
Refactor SQ8 benchmarks by removing precomputed variants and updating…
dor-forer Dec 30, 2025
ee26740
foramt
dor-forer Dec 30, 2025
afe1a4f
Remove serialization benchmark script for HNSW disk serialization
dor-forer Dec 30, 2025
a31f95c
Refactor SQ8 distance functions and tests to remove precomputed norm …
dor-forer Dec 31, 2025
f12ecf4
format
dor-forer Dec 31, 2025
0e36030
Merge branch 'main' of https://github.com/RedisAI/VectorSimilarity in…
dor-forer Dec 31, 2025
fdc16c6
Refactor SQ8 distance tests to use compressed vectors and improve nor…
dor-forer Dec 31, 2025
e5f519c
Update vector layout documentation to reflect removal of sum of squar…
dor-forer Dec 31, 2025
53f8e0e
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 1, 2026
b12c796
Refactor L2 SQ8 distance computation to remove unused accumulators an…
dor-forer Jan 1, 2026
db1e671
Refactor SQ8 distance functions to remove norm computation
dor-forer Jan 1, 2026
d5b8587
Update SQ8-to-SQ8 distance function comment to remove norm reference
dor-forer Jan 1, 2026
91f48df
Refactor cosine similarity functions to remove unnecessary subtractio…
dor-forer Jan 1, 2026
0050bb9
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 1, 2026
a75ddd6
Refactor L2 SQ8 distance functions to eliminate unused accumulators a…
dor-forer Jan 1, 2026
a37918b
Refactor SQ8 L2 and IP implementations to use common inner product fu…
dor-forer Jan 1, 2026
b660111
Refactor cosine similarity functions to use specific SIMD implementat…
dor-forer Jan 1, 2026
40ef6a3
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 1, 2026
5a544db
Refactor L2 distance functions for SQ8 vectors to utilize common inne…
dor-forer Jan 1, 2026
9166cac
Refactor benchmark setup to allocate additional space for sum and sum…
dor-forer Jan 4, 2026
f28f4e7
Add CPU feature checks to disable optimizations for AArch64 in SQ8 di…
dor-forer Jan 4, 2026
e50dc45
Add CPU feature checks to disable optimizations for AArch64 in SQ8 di…
dor-forer Jan 4, 2026
d24ea8e
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 4, 2026
6bbbc38
Fix formatting issues in SQ8 inner product function and clean up cond…
dor-forer Jan 4, 2026
7983b70
Refactor SQ8 distance functions and tests for improved readability an…
dor-forer Jan 4, 2026
7f4af80
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 4, 2026
c6353cb
Refactor SQ8 L2Sqr tests to use quantized vectors and improve alignme…
dor-forer Jan 4, 2026
66a5f88
Enhance SQ8 Inner Product Implementations with Optimized Dot Product …
dor-forer Jan 4, 2026
d7972e9
Fix header guard duplication and update test assertion for floating-p…
dor-forer Jan 4, 2026
a8075bf
Add missing pragma once directive in NEON header files
dor-forer Jan 4, 2026
cddc497
Refactor SQ8 distance functions for improved performance and clarity
dor-forer Jan 4, 2026
4f0fec7
Update SQ8 vector population functions to include metadata and adjust…
dor-forer Jan 4, 2026
8ab4192
Refactor SQ8 inner product functions for improved clarity and perform…
dor-forer Jan 4, 2026
63f4e87
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 4, 2026
5a52b79
Refactor L2 distance functions to utilize common inner product implem…
dor-forer Jan 4, 2026
8c59cb2
Rename inner product implementation functions for AVX2 and AVX512 for…
dor-forer Jan 4, 2026
a0796db
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 4, 2026
a4ff5d0
Refactor SQ8 cosine function to utilize inner product function for im…
dor-forer Jan 4, 2026
c22158f
Remove redundant inner product edge case tests for SQ8 distance funct…
dor-forer Jan 4, 2026
4c19d9e
Add SVE2 support to SQ8-to-SQ8 Inner Product distance function
dor-forer Jan 4, 2026
e2ad287
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 4, 2026
668315b
Fix SQ8_Cosine to call the correct inner product function for improve…
dor-forer Jan 4, 2026
5c22af8
Remove SVE2 and other optimizations from SQ8 cosine function test for…
dor-forer Jan 4, 2026
ad515ba
Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…
dor-forer Jan 4, 2026
695bbc0
Merge branch 'main' of https://github.com/RedisAI/VectorSimilarity in…
dor-forer Jan 5, 2026
cae2dd6
Add L2 distance function without optimizations for testing purposes
dor-forer Jan 5, 2026
b2506b9
Refactor L2 distance function and update test assertions for precision
dor-forer Jan 5, 2026
59784db
Update L2 squared distance functions to support 64 residuals in NEON …
dor-forer Jan 5, 2026
8d24786
Refactor L2 distance function conditions for NEON optimizations
dor-forer Jan 5, 2026
0dde4d5
Adjust NEON_DOTPROD benchmark initialization to use a dimension of 16
dor-forer Jan 5, 2026
3b38d8e
Update NEON benchmarks to support 64 dimensions for L2 and Cosine met…
dor-forer Jan 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions src/VecSim/spaces/IP/IP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,10 @@ float SQ8_Cosine(const void *pVect1v, const void *pVect2v, size_t dimension) {
return 1.0f - res;
}

// SQ8-to-SQ8: Both vectors are uint8 quantized with precomputed sum
// SQ8-to-SQ8: Common inner product implementation that returns the raw inner product value
// (not distance). Used by both SQ8_SQ8_InnerProduct, SQ8_SQ8_Cosine, and SQ8_SQ8_L2Sqr.
// Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
float SQ8_SQ8_InnerProduct(const void *pVect1v, const void *pVect2v, size_t dimension) {
float SQ8_SQ8_InnerProduct_Impl(const void *pVect1v, const void *pVect2v, size_t dimension) {
const auto *pVect1 = static_cast<const uint8_t *>(pVect1v);
const auto *pVect2 = static_cast<const uint8_t *>(pVect2v);

Expand All @@ -73,9 +74,14 @@ float SQ8_SQ8_InnerProduct(const void *pVect1v, const void *pVect2v, size_t dime

// Apply the algebraic formula using precomputed sums:
// IP = min1*sum2 + min2*sum1 + delta1*delta2*Σ(q1[i]*q2[i]) - dim*min1*min2
float res = min_val1 * sum2 + min_val2 * sum1 -
static_cast<float>(dimension) * min_val1 * min_val2 + delta1 * delta2 * product;
return 1.0f - res;
return min_val1 * sum2 + min_val2 * sum1 - static_cast<float>(dimension) * min_val1 * min_val2 +
delta1 * delta2 * product;
}

// SQ8-to-SQ8: Both vectors are uint8 quantized with precomputed sum
// Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
float SQ8_SQ8_InnerProduct(const void *pVect1v, const void *pVect2v, size_t dimension) {
return 1.0f - SQ8_SQ8_InnerProduct_Impl(pVect1v, pVect2v, dimension);
}

// SQ8-to-SQ8: Both vectors are uint8 quantized and normalized with precomputed sum
Expand Down
5 changes: 5 additions & 0 deletions src/VecSim/spaces/IP/IP.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ float SQ8_InnerProduct(const void *pVect1v, const void *pVect2v, size_t dimensio
// pVect1v vector of type fp32 and pVect2v vector of type uint8
float SQ8_Cosine(const void *pVect1v, const void *pVect2v, size_t dimension);

// SQ8-to-SQ8: Common inner product implementation that returns the raw inner product value
// (not distance). Used by both SQ8_SQ8_InnerProduct, SQ8_SQ8_Cosine, and SQ8_SQ8_L2Sqr.
// Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
float SQ8_SQ8_InnerProduct_Impl(const void *pVect1v, const void *pVect2v, size_t dimension);

// SQ8-to-SQ8: Both vectors are uint8 quantized with precomputed sum
// Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
float SQ8_SQ8_InnerProduct(const void *pVect1v, const void *pVect2v, size_t dimension);
Expand Down
26 changes: 26 additions & 0 deletions src/VecSim/spaces/L2/L2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
* GNU Affero General Public License v3 (AGPLv3).
*/
#include "L2.h"
#include "VecSim/spaces/IP/IP.h"
#include "VecSim/types/bfloat16.h"
#include "VecSim/types/float16.h"
#include <cstring>
Expand Down Expand Up @@ -132,3 +133,28 @@ float UINT8_L2Sqr(const void *pVect1v, const void *pVect2v, size_t dimension) {
const auto *pVect2 = static_cast<const uint8_t *>(pVect2v);
return float(INTEGER_L2Sqr(pVect1, pVect2, dimension));
}

// SQ8-to-SQ8 L2 squared distance (both vectors are uint8 quantized)
// Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
// [sum_of_squares (float)]
// ||x - y||² = ||x||² + ||y||² - 2*IP(x, y)
// where:
// - ||x||² = sum_squares_x is precomputed and stored
// - ||y||² = sum_squares_y is precomputed and stored
// - IP(x, y) is computed using SQ8_SQ8_InnerProduct_Impl

float SQ8_SQ8_L2Sqr(const void *pVect1v, const void *pVect2v, size_t dimension) {
const auto *pVect1 = static_cast<const uint8_t *>(pVect1v);
const auto *pVect2 = static_cast<const uint8_t *>(pVect2v);

// Get precomputed sum of squares from both vectors
// Layout: [uint8_t values (dim)] [min_val] [delta] [sum] [sum_of_squares]
const float sum_sq_1 = *reinterpret_cast<const float *>(pVect1 + dimension + 3 * sizeof(float));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a macro/ enum of the metadata indexes instead of hardcoding them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just thinking about it.
Maybe I will add it in the renaming pr

const float sum_sq_2 = *reinterpret_cast<const float *>(pVect2 + dimension + 3 * sizeof(float));

// Use the common inner product implementation
const float ip = SQ8_SQ8_InnerProduct_Impl(pVect1v, pVect2v, dimension);

// L2² = ||x||² + ||y||² - 2*IP(x, y)
return sum_sq_1 + sum_sq_2 - 2.0f * ip;
}
3 changes: 3 additions & 0 deletions src/VecSim/spaces/L2/L2.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@ float FP16_L2Sqr(const void *pVect1, const void *pVect2, size_t dimension);
float INT8_L2Sqr(const void *pVect1v, const void *pVect2v, size_t dimension);

float UINT8_L2Sqr(const void *pVect1v, const void *pVect2v, size_t dimension);

// SQ8-to-SQ8 L2 squared distance (both vectors are uint8 quantized)
float SQ8_SQ8_L2Sqr(const void *pVect1v, const void *pVect2v, size_t dimension);
42 changes: 42 additions & 0 deletions src/VecSim/spaces/L2/L2_AVX512F_BW_VL_VNNI_SQ8_SQ8.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* Copyright (c) 2006-Present, Redis Ltd.
* All rights reserved.
*
* Licensed under your choice of the Redis Source Available License 2.0
* (RSALv2); or (b) the Server Side Public License v1 (SSPLv1); or (c) the
* GNU Affero General Public License v3 (AGPLv3).
*/
#pragma once
#include "VecSim/spaces/space_includes.h"
#include "VecSim/spaces/IP/IP_AVX512F_BW_VL_VNNI_SQ8_SQ8.h"

/**
* SQ8-to-SQ8 L2 squared distance using AVX512 VNNI.
* Computes L2 squared distance between two SQ8 (scalar quantized 8-bit) vectors,
* where BOTH vectors are uint8 quantized.
*
* Uses the identity: ||x - y||² = ||x||² + ||y||² - 2*IP(x, y)
* where ||x||² and ||y||² are precomputed sum of squares stored in the vector data.
*
* Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
* [sum_of_squares (float)]
*/

// L2 squared distance using the common inner product implementation
template <unsigned char residual> // 0..63
float SQ8_SQ8_L2SqrSIMD64_AVX512F_BW_VL_VNNI(const void *pVec1v, const void *pVec2v,
size_t dimension) {

// Use the common inner product implementation (returns raw IP, not distance)
const float ip = SQ8_SQ8_InnerProductImp<residual>(pVec1v, pVec2v, dimension);

const uint8_t *pVec1 = static_cast<const uint8_t *>(pVec1v);
const uint8_t *pVec2 = static_cast<const uint8_t *>(pVec2v);
// Get precomputed sum of squares from both vectors
// Layout: [uint8_t values (dim)] [min_val] [delta] [sum] [sum_of_squares]
const float sum_sq_1 = *reinterpret_cast<const float *>(pVec1 + dimension + 3 * sizeof(float));
const float sum_sq_2 = *reinterpret_cast<const float *>(pVec2 + dimension + 3 * sizeof(float));

// L2² = ||x||² + ||y||² - 2*IP(x, y)
return sum_sq_1 + sum_sq_2 - 2.0f * ip;
}
42 changes: 42 additions & 0 deletions src/VecSim/spaces/L2/L2_NEON_DOTPROD_SQ8_SQ8.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* Copyright (c) 2006-Present, Redis Ltd.
* All rights reserved.
*
* Licensed under your choice of the Redis Source Available License 2.0
* (RSALv2); or (b) the Server Side Public License v1 (SSPLv1); or (c) the
* GNU Affero General Public License v3 (AGPLv3).
*/
#pragma once
#include "VecSim/spaces/space_includes.h"
#include "VecSim/spaces/IP/IP_NEON_DOTPROD_SQ8_SQ8.h"

/**
* SQ8-to-SQ8 L2 squared distance functions for NEON with DOTPROD extension.
* Computes L2 squared distance between two SQ8 (scalar quantized 8-bit) vectors,
* where BOTH vectors are uint8 quantized.
*
* Uses the identity: ||x - y||² = ||x||² + ||y||² - 2*IP(x, y)
* where ||x||² and ||y||² are precomputed sum of squares stored in the vector data.
*
* Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
* [sum_of_squares (float)]
*/

// L2 squared distance using the common inner product implementation
template <unsigned char residual> // 0..63
float SQ8_SQ8_L2SqrSIMD64_NEON_DOTPROD(const void *pVec1v, const void *pVec2v, size_t dimension) {
// Use the common inner product implementation (returns raw IP, not distance)
const float ip =
SQ8_SQ8_InnerProductSIMD64_NEON_DOTPROD_IMP<residual>(pVec1v, pVec2v, dimension);

const uint8_t *pVec1 = static_cast<const uint8_t *>(pVec1v);
const uint8_t *pVec2 = static_cast<const uint8_t *>(pVec2v);

// Get precomputed sum of squares from both vectors
// Layout: [uint8_t values (dim)] [min_val] [delta] [sum] [sum_of_squares]
const float sum_sq_1 = *reinterpret_cast<const float *>(pVec1 + dimension + 3 * sizeof(float));
const float sum_sq_2 = *reinterpret_cast<const float *>(pVec2 + dimension + 3 * sizeof(float));

// L2² = ||x||² + ||y||² - 2*IP(x, y)
return sum_sq_1 + sum_sq_2 - 2.0f * ip;
}
41 changes: 41 additions & 0 deletions src/VecSim/spaces/L2/L2_NEON_SQ8_SQ8.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Copyright (c) 2006-Present, Redis Ltd.
* All rights reserved.
*
* Licensed under your choice of the Redis Source Available License 2.0
* (RSALv2); or (b) the Server Side Public License v1 (SSPLv1); or (c) the
* GNU Affero General Public License v3 (AGPLv3).
*/
#pragma once
#include "VecSim/spaces/space_includes.h"
#include "VecSim/spaces/IP/IP_NEON_SQ8_SQ8.h"

/**
* SQ8-to-SQ8 L2 squared distance functions for NEON.
* Computes L2 squared distance between two SQ8 (scalar quantized 8-bit) vectors,
* where BOTH vectors are uint8 quantized.
*
* Uses the identity: ||x - y||² = ||x||² + ||y||² - 2*IP(x, y)
* where ||x||² and ||y||² are precomputed sum of squares stored in the vector data.
*
* Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
* [sum_of_squares (float)]
*/

// L2 squared distance using the common inner product implementation
template <unsigned char residual> // 0..63
float SQ8_SQ8_L2SqrSIMD64_NEON(const void *pVec1v, const void *pVec2v, size_t dimension) {
// Use the common inner product implementation (returns raw IP, not distance)
const float ip = SQ8_SQ8_InnerProductSIMD64_NEON_IMP<residual>(pVec1v, pVec2v, dimension);

const uint8_t *pVec1 = static_cast<const uint8_t *>(pVec1v);
const uint8_t *pVec2 = static_cast<const uint8_t *>(pVec2v);

// Get precomputed sum of squares from both vectors
// Layout: [uint8_t values (dim)] [min_val] [delta] [sum] [sum_of_squares]
const float sum_sq_1 = *reinterpret_cast<const float *>(pVec1 + dimension + 3 * sizeof(float));
const float sum_sq_2 = *reinterpret_cast<const float *>(pVec2 + dimension + 3 * sizeof(float));

// L2² = ||x||² + ||y||² - 2*IP(x, y)
return sum_sq_1 + sum_sq_2 - 2.0f * ip;
}
42 changes: 42 additions & 0 deletions src/VecSim/spaces/L2/L2_SVE_SQ8_SQ8.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
* Copyright (c) 2006-Present, Redis Ltd.
* All rights reserved.
*
* Licensed under your choice of the Redis Source Available License 2.0
* (RSALv2); or (b) the Server Side Public License v1 (SSPLv1); or (c) the
* GNU Affero General Public License v3 (AGPLv3).
*/
#pragma once
#include "VecSim/spaces/space_includes.h"
#include "VecSim/spaces/IP/IP_SVE_SQ8_SQ8.h"

/**
* SQ8-to-SQ8 L2 squared distance functions for SVE.
* Computes L2 squared distance between two SQ8 (scalar quantized 8-bit) vectors,
* where BOTH vectors are uint8 quantized.
*
* Uses the identity: ||x - y||² = ||x||² + ||y||² - 2*IP(x, y)
* where ||x||² and ||y||² are precomputed sum of squares stored in the vector data.
*
* Vector layout: [uint8_t values (dim)] [min_val (float)] [delta (float)] [sum (float)]
* [sum_of_squares (float)]
*/

// L2 squared distance using the common inner product implementation
template <bool partial_chunk, unsigned char additional_steps>
float SQ8_SQ8_L2SqrSIMD_SVE(const void *pVec1v, const void *pVec2v, size_t dimension) {
// Use the common inner product implementation (returns raw IP, not distance)
const float ip = SQ8_SQ8_InnerProductSIMD_SVE_IMP<partial_chunk, additional_steps>(
pVec1v, pVec2v, dimension);

const uint8_t *pVec1 = static_cast<const uint8_t *>(pVec1v);
const uint8_t *pVec2 = static_cast<const uint8_t *>(pVec2v);

// Get precomputed sum of squares from both vectors
// Layout: [uint8_t values (dim)] [min_val] [delta] [sum] [sum_of_squares]
const float sum_sq_1 = *reinterpret_cast<const float *>(pVec1 + dimension + 3 * sizeof(float));
const float sum_sq_2 = *reinterpret_cast<const float *>(pVec2 + dimension + 3 * sizeof(float));

// L2² = ||x||² + ||y||² - 2*IP(x, y)
return sum_sq_1 + sum_sq_2 - 2.0f * ip;
}
46 changes: 46 additions & 0 deletions src/VecSim/spaces/L2_space.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -417,4 +417,50 @@ dist_func_t<float> L2_UINT8_GetDistFunc(size_t dim, unsigned char *alignment,
return ret_dist_func;
}

// SQ8-to-SQ8 L2 squared distance function (both vectors are uint8 quantized)
dist_func_t<float> L2_SQ8_SQ8_GetDistFunc(size_t dim, unsigned char *alignment,
const void *arch_opt) {
unsigned char dummy_alignment;
if (alignment == nullptr) {
alignment = &dummy_alignment;
}

dist_func_t<float> ret_dist_func = SQ8_SQ8_L2Sqr;
[[maybe_unused]] auto features = getCpuOptimizationFeatures(arch_opt);

#ifdef CPU_FEATURES_ARCH_AARCH64
#ifdef OPT_SVE2
if (features.sve2) {
return Choose_SQ8_SQ8_L2_implementation_SVE2(dim);
}
#endif
#ifdef OPT_SVE
if (features.sve) {
return Choose_SQ8_SQ8_L2_implementation_SVE(dim);
}
#endif
#ifdef OPT_NEON_DOTPROD
// DOTPROD uses integer arithmetic - much faster than float-based NEON
if (dim >= 16 && features.asimddp) {
return Choose_SQ8_SQ8_L2_implementation_NEON_DOTPROD(dim);
}
#endif
#ifdef OPT_NEON
if (dim >= 16 && features.asimd) {
return Choose_SQ8_SQ8_L2_implementation_NEON(dim);
}
#endif
#endif // AARCH64

#ifdef CPU_FEATURES_ARCH_X86_64
#ifdef OPT_AVX512_F_BW_VL_VNNI
// AVX512 VNNI SQ8_SQ8 uses 64-element chunks
if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) {
return Choose_SQ8_SQ8_L2_implementation_AVX512F_BW_VL_VNNI(dim);
}
#endif
#endif // __x86_64__
return ret_dist_func;
}

} // namespace spaces
2 changes: 2 additions & 0 deletions src/VecSim/spaces/L2_space.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,6 @@ dist_func_t<float> L2_UINT8_GetDistFunc(size_t dim, unsigned char *alignment = n
const void *arch_opt = nullptr);
dist_func_t<float> L2_SQ8_GetDistFunc(size_t dim, unsigned char *alignment = nullptr,
const void *arch_opt = nullptr);
dist_func_t<float> L2_SQ8_SQ8_GetDistFunc(size_t dim, unsigned char *alignment = nullptr,
const void *arch_opt = nullptr);
} // namespace spaces
7 changes: 7 additions & 0 deletions src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "VecSim/spaces/L2/L2_AVX512F_BW_VL_VNNI_SQ8.h"

#include "VecSim/spaces/IP/IP_AVX512F_BW_VL_VNNI_SQ8_SQ8.h"
#include "VecSim/spaces/L2/L2_AVX512F_BW_VL_VNNI_SQ8_SQ8.h"

namespace spaces {

Expand Down Expand Up @@ -87,6 +88,12 @@ dist_func_t<float> Choose_SQ8_SQ8_Cosine_implementation_AVX512F_BW_VL_VNNI(size_
return ret_dist_func;
}

dist_func_t<float> Choose_SQ8_SQ8_L2_implementation_AVX512F_BW_VL_VNNI(size_t dim) {
dist_func_t<float> ret_dist_func;
CHOOSE_IMPLEMENTATION(ret_dist_func, dim, 64, SQ8_SQ8_L2SqrSIMD64_AVX512F_BW_VL_VNNI);
return ret_dist_func;
}

#include "implementation_chooser_cleanup.h"

} // namespace spaces
1 change: 1 addition & 0 deletions src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@ dist_func_t<float> Choose_SQ8_L2_implementation_AVX512F_BW_VL_VNNI(size_t dim);
// SQ8-to-SQ8 distance functions (both vectors are uint8 quantized with precomputed sum)
dist_func_t<float> Choose_SQ8_SQ8_IP_implementation_AVX512F_BW_VL_VNNI(size_t dim);
dist_func_t<float> Choose_SQ8_SQ8_Cosine_implementation_AVX512F_BW_VL_VNNI(size_t dim);
dist_func_t<float> Choose_SQ8_SQ8_L2_implementation_AVX512F_BW_VL_VNNI(size_t dim);

} // namespace spaces
7 changes: 7 additions & 0 deletions src/VecSim/spaces/functions/NEON.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "VecSim/spaces/L2/L2_NEON_SQ8.h"
#include "VecSim/spaces/IP/IP_NEON_SQ8.h"
#include "VecSim/spaces/IP/IP_NEON_SQ8_SQ8.h"
#include "VecSim/spaces/L2/L2_NEON_SQ8_SQ8.h"

namespace spaces {

Expand Down Expand Up @@ -114,6 +115,12 @@ dist_func_t<float> Choose_SQ8_SQ8_Cosine_implementation_NEON(size_t dim) {
return ret_dist_func;
}

dist_func_t<float> Choose_SQ8_SQ8_L2_implementation_NEON(size_t dim) {
dist_func_t<float> ret_dist_func;
CHOOSE_IMPLEMENTATION(ret_dist_func, dim, 64, SQ8_SQ8_L2SqrSIMD64_NEON);
return ret_dist_func;
}

#include "implementation_chooser_cleanup.h"

} // namespace spaces
1 change: 1 addition & 0 deletions src/VecSim/spaces/functions/NEON.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,6 @@ dist_func_t<float> Choose_SQ8_Cosine_implementation_NEON(size_t dim);
// SQ8-to-SQ8 distance functions (both vectors are uint8 quantized with precomputed sum)
dist_func_t<float> Choose_SQ8_SQ8_IP_implementation_NEON(size_t dim);
dist_func_t<float> Choose_SQ8_SQ8_Cosine_implementation_NEON(size_t dim);
dist_func_t<float> Choose_SQ8_SQ8_L2_implementation_NEON(size_t dim);

} // namespace spaces
7 changes: 7 additions & 0 deletions src/VecSim/spaces/functions/NEON_DOTPROD.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include "VecSim/spaces/IP/IP_NEON_DOTPROD_SQ8_SQ8.h"
#include "VecSim/spaces/L2/L2_NEON_DOTPROD_INT8.h"
#include "VecSim/spaces/L2/L2_NEON_DOTPROD_UINT8.h"
#include "VecSim/spaces/L2/L2_NEON_DOTPROD_SQ8_SQ8.h"

namespace spaces {

Expand Down Expand Up @@ -66,6 +67,12 @@ dist_func_t<float> Choose_SQ8_SQ8_Cosine_implementation_NEON_DOTPROD(size_t dim)
return ret_dist_func;
}

dist_func_t<float> Choose_SQ8_SQ8_L2_implementation_NEON_DOTPROD(size_t dim) {
dist_func_t<float> ret_dist_func;
CHOOSE_IMPLEMENTATION(ret_dist_func, dim, 64, SQ8_SQ8_L2SqrSIMD64_NEON_DOTPROD);
return ret_dist_func;
}

#include "implementation_chooser_cleanup.h"

} // namespace spaces
1 change: 1 addition & 0 deletions src/VecSim/spaces/functions/NEON_DOTPROD.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,6 @@ dist_func_t<float> Choose_UINT8_L2_implementation_NEON_DOTPROD(size_t dim);
// SQ8-to-SQ8 DOTPROD-optimized distance functions (with precomputed sum)
dist_func_t<float> Choose_SQ8_SQ8_IP_implementation_NEON_DOTPROD(size_t dim);
dist_func_t<float> Choose_SQ8_SQ8_Cosine_implementation_NEON_DOTPROD(size_t dim);
dist_func_t<float> Choose_SQ8_SQ8_L2_implementation_NEON_DOTPROD(size_t dim);

} // namespace spaces
Loading
Loading