From 4c133842136dc84d952d1509dddea1cc1a3c67fb Mon Sep 17 00:00:00 2001 From: adrianoamaral Date: Thu, 21 Aug 2025 11:19:19 +0100 Subject: [PATCH 1/4] refactor svs-compression.md Update the SVS compression documentation to focus on the feature of compression and quantization, using the technique names LVQ and LeanVec, as these are directly reflected in the API and implementation. Remove generic references and ensure all sections and examples use LVQ and LeanVec terminology, as well as their respective naming conventions for configuration. The rewritten content should clarify the function and benefit of each variant, the role of multi-level compression, and practical considerations for parameter learning and data drift, while maintaining warning about hardware dependency and proprietary optimizations. --- .../vectors/svs-compression.md | 90 +++++++++---------- 1 file changed, 43 insertions(+), 47 deletions(-) diff --git a/content/develop/ai/search-and-query/vectors/svs-compression.md b/content/develop/ai/search-and-query/vectors/svs-compression.md index 3057c5ad3..c4f305ff1 100644 --- a/content/develop/ai/search-and-query/vectors/svs-compression.md +++ b/content/develop/ai/search-and-query/vectors/svs-compression.md @@ -9,27 +9,27 @@ categories: - oss - kubernetes - clients -description: Intel scalable vector search (SVS) LVQ and LeanVec compression -linkTitle: Intel SVS compression -title: Intel scalable vector search (SVS) compression +description: Vector compression and quantization for efficient memory usage and search performance +linkTitle: Vector Compression & Quantization +title: Vector Compression and Quantization Techniques weight: 2 --- -Intel's SVS (Scalable Vector Search) introduces two advanced vector compression techniques—LVQ and LeanVec—designed to optimize memory usage and search performance. These methods compress high-dimensional vectors while preserving the geometric relationships essential for accurate similarity search. +Efficient management of high-dimensional vector data is crucial for scalable search and retrieval. Advanced methods for vector compression and quantization—such as LVQ (Locally-Adaptive Vector Quantization) and LeanVec—can dramatically optimize memory usage and improve search speed, without sacrificing too much accuracy. This page describes practical approaches to compressing and quantizing vectors for scalable search. {{< warning >}} -Intel's proprietary LVQ and LeanVec optimizations are not available on Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to Intel’s basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. +Some advanced vector compression features may depend on hardware or Intel's proprietary optimizations. On platforms without these capabilities, generic compression methods will be used, possibly with reduced performance. {{< /warning >}} -## LVQ and LeanVec compression +## Compression and Quantization Techniques -### LVQ (locally-adaptive vector quantization) +### LVQ (Locally-Adaptive Vector Quantization) -* **Method:** Applies per-vector normalization and scalar quantization. +* **Method:** Applies per-vector normalization and scalar quantization, learning parameters directly from the data. * **Advantages:** * Enables fast, on-the-fly distance computations. - * SIMD-optimized layout using Turbo LVQ for efficient distance computations. - * Learns compression parameters from data. + * SIMD-optimized layout for efficient search. + * Learns compression parameters from representative vectors. * **Variants:** * **LVQ4x4:** 8 bits per dimension, fast search, large memory savings. * **LVQ8:** Faster ingestion, slower search. @@ -37,66 +37,62 @@ Intel's proprietary LVQ and LeanVec optimizations are not available on Redis Ope ### LeanVec -* **Method:** Combines dimensionality reduction with LVQ. +* **Method:** Combines dimensionality reduction with LVQ, applying quantization after reducing vector dimensions. * **Advantages:** - * Ideal for high-dimensional vectors. - * Significant performance boost with reduced memory. + * Best suited for high-dimensional vectors. + * Significant speed and memory improvements. * **Variants:** * **LeanVec4x8:** Recommended for high-dimensional datasets, fastest search and ingestion. - * **LeanVec8x8:** Improved recall when LeanVec4x8 is insufficient. -* **LeanVec Dimension:** For faster search and lower memory use, reduce the dimension further by using the optional `REDUCE` argument. The default value for `REDUCE` is `input dim / 2`; try `dim / 4` for even higher reduction. + * **LeanVec8x8:** Improved recall when more granularity is needed. +* **LeanVec Dimension:** For faster search and lower memory usage, reduce the dimension further by using the optional `REDUCE` argument. The default is typically `input dimension / 2`, but more aggressive reduction (such as `dimension / 4`) is possible for greater efficiency. -## Choosing a compression type +## Choosing a Compression Type -| Compression type | Best for | Observations | -|------------------|----------|--------------| -| LVQ4x4 | Fast search in most cases with low memory use | Consider LeanVec for even faster search | -| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall. | -| LVQ4 | Maximum memory saving | Recall might be insufficient | -| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 | -| LeanVec8x8 | Improved recall in case LeanVec4x8 is not sufficient | LeanVec dimensionality reduction might reduce recall | -| LVQ4x8 | Improved recall in case LVQ4x4 is not sufficient | Worse memory savings | +| Compression type | Best for | Observations | +|----------------------|--------------------------------------------------|---------------------------------------------------------| +| LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search | +| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall | +| LVQ4 | Maximum memory saving | Recall might be insufficient | +| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 | +| LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall | +| LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings | -## Two-level compression +## Two-Level Compression -Both LVQ and LeanVec support two-level compression schemes. LVQ's two-level compression works by first quantizing each vector individually to capture its main structure, then encoding the residual error—the difference between the original and quantized vector—using a second quantization step. This allows fast search using only the first level, with the second level used for re-ranking to boost accuracy when needed. +Both LVQ and LeanVec support multi-level compression schemes. The first level compresses each vector to capture its main structure, while the second encodes residual errors for more precise re-ranking. -Similarly, LeanVec uses a two-level approach: the first level reduces dimensionality and applies LVQ to speed up candidate retrieval, while the second level applies LVQ to the original high-dimensional vectors for accurate re-ranking. - -Note that the original full-precision embeddings are never used by either LVQ or LeanVec, as both operate entirely on compressed representations. - -This two-level approach allows for: +This two-level approach enables: * Fast candidate retrieval using the first-level compressed vectors. -* High-accuracy re-ranking using the second-level residuals. +* High-accuracy re-ranking using second-level residuals. -The naming convention used for the configurations reflects the number of bits allocated per dimension at each level of compression. +The naming convention reflects the number of bits per dimension at each compression level. -### Naming convention: LVQx +### Naming convention: LVQx or LeanVecx -* **B₁:** Number of bits per dimension used in the first-level quantization. -* **B₂:** Number of bits per dimension used in the second-level quantization (residual encoding). +* **B₁:** Bits per dimension for first-level quantization. +* **B₂:** Bits per dimension for second-level quantization (residual encoding). #### Examples * **LVQ4x8:** * First level: 4 bits per dimension. * Second level: 8 bits per dimension. - * Total: 12 bits per dimension (used across two stages). + * Total: 12 bits per dimension (across two stages). * **LVQ8:** - * Single-level compression only. + * Single-level compression. * 8 bits per dimension. * No second-level residuals. +* **LeanVec4x8:** + * Dimensionality reduction followed by LVQ4x8 scheme. -Same notation is used for LeanVec. - -## Learning compression parameters from vector data +## Learning Compression Parameters from Vector Data -The strong performance of LVQ and LeanVec stems from their ability to adapt to the structure of the input vectors. By learning compression parameters directly from the data, they achieve more accurate representations with fewer bits. +The effectiveness of LVQ and LeanVec compression relies on adapting to the structure of input vectors. Learning parameters directly from data leads to more accurate and efficient search. -### What does this mean in practice? +### Practical Considerations -* **Initial training requirement:** - A minimum number of representative vectors is required during index initialization to train the compression parameters (see the [TRAINING_THRESHOLD]({{< relref "/develop/ai/search-and-query/vectors/#svs-vamana-index" >}}) parameter). A random sample from the dataset typically works well. -* **Handling data drift:** - If the characteristics of incoming vectors change significantly over time (that is, a data distribution shift), compression quality may degrade. This is a general limitation of all data-dependent compression methods,not just LVQ and LeanVec. When the data no longer resembles the original training sample, the learned representation becomes less effective. +* **Initial Training Requirement:** + A minimum number of representative vectors is needed during index initialization to train the compression parameters (see [TRAINING_THRESHOLD]({{< relref "/develop/ai/search-and-query/vectors/svs-training.md" >}})). +* **Handling Data Drift:** + If incoming vector characteristics change significantly over time (data distribution shift), compression quality may degrade—a general limitation of all data-dependent methods. From 5a92938a525fce2a45b84264b77a6e62c36c4cff Mon Sep 17 00:00:00 2001 From: adrianoamaral Date: Thu, 21 Aug 2025 11:24:30 +0100 Subject: [PATCH 2/4] minor fixes svs-compression.md --- .../vectors/svs-compression.md | 52 ++++++++++--------- 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/content/develop/ai/search-and-query/vectors/svs-compression.md b/content/develop/ai/search-and-query/vectors/svs-compression.md index c4f305ff1..39cbb9e77 100644 --- a/content/develop/ai/search-and-query/vectors/svs-compression.md +++ b/content/develop/ai/search-and-query/vectors/svs-compression.md @@ -9,16 +9,16 @@ categories: - oss - kubernetes - clients -description: Vector compression and quantization for efficient memory usage and search performance -linkTitle: Vector Compression & Quantization -title: Vector Compression and Quantization Techniques +description: Intel scalable vector search (SVS) LVQ and LeanVec compression +linkTitle: Intel SVS compression +title: Intel scalable vector search (SVS) compression weight: 2 --- -Efficient management of high-dimensional vector data is crucial for scalable search and retrieval. Advanced methods for vector compression and quantization—such as LVQ (Locally-Adaptive Vector Quantization) and LeanVec—can dramatically optimize memory usage and improve search speed, without sacrificing too much accuracy. This page describes practical approaches to compressing and quantizing vectors for scalable search. +Intel's SVS (Scalable Vector Search) introduces two advanced vector compression techniques—LVQ and LeanVec—designed to optimize memory usage and search performance. These methods compress high-dimensional vectors while preserving the geometric relationships essential for accurate similarity search. {{< warning >}} -Some advanced vector compression features may depend on hardware or Intel's proprietary optimizations. On platforms without these capabilities, generic compression methods will be used, possibly with reduced performance. +Intel's proprietary LVQ and LeanVec optimizations are not available on Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to Intel’s basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. {{< /warning >}} ## Compression and Quantization Techniques @@ -57,42 +57,46 @@ Some advanced vector compression features may depend on hardware or Intel's prop | LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall | | LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings | -## Two-Level Compression +## Two-level compression -Both LVQ and LeanVec support multi-level compression schemes. The first level compresses each vector to capture its main structure, while the second encodes residual errors for more precise re-ranking. +Both LVQ and LeanVec support two-level compression schemes. LVQ's two-level compression works by first quantizing each vector individually to capture its main structure, then encoding the residual error—the difference between the original and quantized vector—using a second quantization step. This allows fast search using only the first level, with the second level used for re-ranking to boost accuracy when needed. -This two-level approach enables: +Similarly, LeanVec uses a two-level approach: the first level reduces dimensionality and applies LVQ to speed up candidate retrieval, while the second level applies LVQ to the original high-dimensional vectors for accurate re-ranking. + +Note that the original full-precision embeddings are never used by either LVQ or LeanVec, as both operate entirely on compressed representations. + +This two-level approach allows for: * Fast candidate retrieval using the first-level compressed vectors. -* High-accuracy re-ranking using second-level residuals. +* High-accuracy re-ranking using the second-level residuals. -The naming convention reflects the number of bits per dimension at each compression level. +The naming convention used for the configurations reflects the number of bits allocated per dimension at each level of compression. -### Naming convention: LVQx or LeanVecx +### Naming convention: LVQx -* **B₁:** Bits per dimension for first-level quantization. -* **B₂:** Bits per dimension for second-level quantization (residual encoding). +* **B₁:** Number of bits per dimension used in the first-level quantization. +* **B₂:** Number of bits per dimension used in the second-level quantization (residual encoding). #### Examples * **LVQ4x8:** * First level: 4 bits per dimension. * Second level: 8 bits per dimension. - * Total: 12 bits per dimension (across two stages). + * Total: 12 bits per dimension (used across two stages). * **LVQ8:** - * Single-level compression. + * Single-level compression only. * 8 bits per dimension. * No second-level residuals. -* **LeanVec4x8:** - * Dimensionality reduction followed by LVQ4x8 scheme. -## Learning Compression Parameters from Vector Data +Same notation is used for LeanVec. + +## Learning compression parameters from vector data -The effectiveness of LVQ and LeanVec compression relies on adapting to the structure of input vectors. Learning parameters directly from data leads to more accurate and efficient search. +The strong performance of LVQ and LeanVec stems from their ability to adapt to the structure of the input vectors. By learning compression parameters directly from the data, they achieve more accurate representations with fewer bits. -### Practical Considerations +### What does this mean in practice? -* **Initial Training Requirement:** - A minimum number of representative vectors is needed during index initialization to train the compression parameters (see [TRAINING_THRESHOLD]({{< relref "/develop/ai/search-and-query/vectors/svs-training.md" >}})). -* **Handling Data Drift:** - If incoming vector characteristics change significantly over time (data distribution shift), compression quality may degrade—a general limitation of all data-dependent methods. +* **Initial training requirement:** + A minimum number of representative vectors is required during index initialization to train the compression parameters (see the [TRAINING_THRESHOLD]({{< relref "/develop/ai/search-and-query/vectors/#svs-vamana-index" >}}) parameter). A random sample from the dataset typically works well. +* **Handling data drift:** + If the characteristics of incoming vectors change significantly over time (that is, a data distribution shift), compression quality may degrade. This is a general limitation of all data-dependent compression methods,not just LVQ and LeanVec. When the data no longer resembles the original training sample, the learned representation becomes less effective. From 7a612952de7e69869ef83d01753a6480975f9fbf Mon Sep 17 00:00:00 2001 From: adrianoamaral Date: Thu, 21 Aug 2025 11:31:34 +0100 Subject: [PATCH 3/4] fix miswordings svs-compression.md --- .../ai/search-and-query/vectors/svs-compression.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/develop/ai/search-and-query/vectors/svs-compression.md b/content/develop/ai/search-and-query/vectors/svs-compression.md index 39cbb9e77..1100cacc9 100644 --- a/content/develop/ai/search-and-query/vectors/svs-compression.md +++ b/content/develop/ai/search-and-query/vectors/svs-compression.md @@ -9,16 +9,16 @@ categories: - oss - kubernetes - clients -description: Intel scalable vector search (SVS) LVQ and LeanVec compression -linkTitle: Intel SVS compression -title: Intel scalable vector search (SVS) compression +description: Vector compression and quantization for efficient memory usage and search performance +linkTitle: Vector Compression & Quantization +title: Vector Compression and Quantization weight: 2 --- -Intel's SVS (Scalable Vector Search) introduces two advanced vector compression techniques—LVQ and LeanVec—designed to optimize memory usage and search performance. These methods compress high-dimensional vectors while preserving the geometric relationships essential for accurate similarity search. +Efficient management of high-dimensional vector data is crucial for scalable search and retrieval. Advanced methods for vector compression and quantization—such as LVQ (Locally-Adaptive Vector Quantization) and LeanVec—can dramatically optimize memory usage and improve search speed, without sacrificing too much accuracy. This page describes practical approaches to compressing and quantizing vectors for scalable search. {{< warning >}} -Intel's proprietary LVQ and LeanVec optimizations are not available on Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to Intel’s basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. +Some advanced vector compression features may depend on hardware or Intel's proprietary optimizations. Intel's proprietary LVQ and LeanVec optimizations are not available on Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. {{< /warning >}} ## Compression and Quantization Techniques From b084906631fa90ba3fcff5e7c414fc70e57a03fa Mon Sep 17 00:00:00 2001 From: "David W. Dougherty" Date: Fri, 22 Aug 2025 08:20:47 -0700 Subject: [PATCH 4/4] Editorial updates (style, etc.) --- .../ai/search-and-query/vectors/_index.md | 2 +- .../vectors/svs-compression.md | 26 ++++++++++--------- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/content/develop/ai/search-and-query/vectors/_index.md b/content/develop/ai/search-and-query/vectors/_index.md index 1db5de44e..fe01bcf82 100644 --- a/content/develop/ai/search-and-query/vectors/_index.md +++ b/content/develop/ai/search-and-query/vectors/_index.md @@ -161,7 +161,7 @@ Choose the `SVS-VAMANA` index type when all of the following requirements apply: | `LEANVEC_DIM` | The dimension used when using `LeanVec4x8` or `LeanVec8x8` compression for dimensionality reduction. If a value is provided, it should be less than `DIM`. Lowering it can speed up search and reduce memory use. | `DIM / 2` | {{< warning >}} -Intel's proprietary LVQ and LeanVec optimizations are not available on Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to Intel’s basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. +Some advanced vector compression features may depend on hardware or Intel's proprietary optimizations. Intel's proprietary LVQ and LeanVec optimizations are not available in Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. {{< /warning >}} **Example** diff --git a/content/develop/ai/search-and-query/vectors/svs-compression.md b/content/develop/ai/search-and-query/vectors/svs-compression.md index 1100cacc9..aa2c58144 100644 --- a/content/develop/ai/search-and-query/vectors/svs-compression.md +++ b/content/develop/ai/search-and-query/vectors/svs-compression.md @@ -1,4 +1,6 @@ --- +aliases: + categories: - docs - develop @@ -9,23 +11,23 @@ categories: - oss - kubernetes - clients -description: Vector compression and quantization for efficient memory usage and search performance -linkTitle: Vector Compression & Quantization -title: Vector Compression and Quantization +description: Vector quantization and compression for efficient memory usage and search performance +linkTitle: Quantization and compression +title: Vector quantization and compression weight: 2 --- -Efficient management of high-dimensional vector data is crucial for scalable search and retrieval. Advanced methods for vector compression and quantization—such as LVQ (Locally-Adaptive Vector Quantization) and LeanVec—can dramatically optimize memory usage and improve search speed, without sacrificing too much accuracy. This page describes practical approaches to compressing and quantizing vectors for scalable search. +Efficient management of high-dimensional vector data is crucial for scalable search and retrieval. Advanced methods for vector quantization and compression, such as LVQ (Locally-adaptive Vector Quantization) and LeanVec, can dramatically optimize memory usage and improve search speed, without sacrificing much accuracy. This page describes practical approaches to quantizing and compressing vectors for scalable search. {{< warning >}} -Some advanced vector compression features may depend on hardware or Intel's proprietary optimizations. Intel's proprietary LVQ and LeanVec optimizations are not available on Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. +Some advanced vector compression features may depend on hardware or Intel's proprietary optimizations. Intel's proprietary LVQ and LeanVec optimizations are not available in Redis Open Source. On non-Intel platforms and Redis Open Source platforms, `SVS-VAMANA` with `COMPRESSION` will fall back to basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision. {{< /warning >}} -## Compression and Quantization Techniques +## Quantization and compression techniques -### LVQ (Locally-Adaptive Vector Quantization) +### LVQ (Locally-adaptive Vector Quantization) -* **Method:** Applies per-vector normalization and scalar quantization, learning parameters directly from the data. +* **Method:** Applies per-vector normalization and scalar quantization; learns parameters directly from the data. * **Advantages:** * Enables fast, on-the-fly distance computations. * SIMD-optimized layout for efficient search. @@ -35,7 +37,7 @@ Some advanced vector compression features may depend on hardware or Intel's prop * **LVQ8:** Faster ingestion, slower search. * **LVQ4x8:** Two-level quantization for improved recall. -### LeanVec +### LeanVec (LVQ with dimensionality reduction) * **Method:** Combines dimensionality reduction with LVQ, applying quantization after reducing vector dimensions. * **Advantages:** @@ -44,11 +46,11 @@ Some advanced vector compression features may depend on hardware or Intel's prop * **Variants:** * **LeanVec4x8:** Recommended for high-dimensional datasets, fastest search and ingestion. * **LeanVec8x8:** Improved recall when more granularity is needed. -* **LeanVec Dimension:** For faster search and lower memory usage, reduce the dimension further by using the optional `REDUCE` argument. The default is typically `input dimension / 2`, but more aggressive reduction (such as `dimension / 4`) is possible for greater efficiency. +* **LeanVec Dimension:** For faster search and lower memory usage, reduce the dimension further by using the optional `REDUCE` argument. The default is typically `input dimension / 2`, but more aggressive reduction (such as `input dimension / 4`) is possible for greater efficiency. -## Choosing a Compression Type +## Choosing a compression type -| Compression type | Best for | Observations | +| Compression type | Best for | Observations | |----------------------|--------------------------------------------------|---------------------------------------------------------| | LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search | | LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall |