Skip to content

Commit 0a781fe

Browse files
committed
Quantization edits
1 parent 9b7b432 commit 0a781fe

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

articles/search/vector-search-how-to-quantization.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,21 @@ Two types of quantization are supported:
4848
>[!Note]
4949
> While free services support quantization, they don't demonstrate the full storage savings due to the limited storage quota.
5050
51+
### How scalar quantization works in Azure AI Search
52+
53+
Scalar quantization reduces the resolution of each number within each vector embedding. Instead of describing each number as a 16-bit or 32-bit floating point number, it uses an 8-bit integer. It identifies a range of numbers (typically 99th percentile minimum and maximum) and divides them into a finite number of levels or bin, assigning each bin an identifier. In 8-bit scalar quantization, there are 2^8, or 256, possible bins.
54+
55+
Each component of the vector is mapped to the closest representative value within this set of quantization levels in a process akin to rounding a real number to the nearest integer. In the quantized 8-bit vector, the identifier number stands in place of the original value. After quantization, each vector is represented by an array of identifiers for the bins to which its components belong. These quantized vectors require much fewer bits to store compared to the original vector, thus reducing storage requirements and memory footprint.
56+
57+
### How binary quantization works in Azure AI Search
58+
59+
Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
60+
61+
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found binary quantization performs very well when embeddings are centered around zero. Most popular embedding models offered by OpenAI, Cohere, and Mistral are centered around zero.
62+
5163
## Supported rescoring techniques
5264

53-
Rescoring is an optional technique used to offset information loss due to vector compression. During query execution, it uses oversampling to pick up extra vectors, and supplemental information to rescore initial results found by the query. Supplemental information is either uncompressed original full-precision vectors - or for binary quantization only - you have the option of rescoring using the binary quantized document candidates against the query vector.
65+
Rescoring is an optional technique used to offset information loss due to vector quantization. During query execution, it uses oversampling to pick up extra vectors, and supplemental information to rescore initial results found by the query. Supplemental information is either uncompressed original full-precision vectors - or for binary quantization only - you have the option of rescoring using the binary quantized document candidates against the query vector.
5466

5567
Only HNSW graphs allow rescoring. Exhaustive KNN doesn't support rescoring because by definition, all vectors are scanned at query time, which makes oversampling irrelevant.
5668

@@ -218,18 +230,6 @@ To use a new quantization configuration, you must create a *new* vector profile.
218230

219231
1. [Load the index](search-what-is-data-import.md) using indexers for pull model indexing, or APIs for push model indexing.
220232

221-
## How scalar quantization works in Azure AI Search
222-
223-
Scalar quantization reduces the resolution of each number within each vector embedding. Instead of describing each number as a 16-bit or 32-bit floating point number, it uses an 8-bit integer. It identifies a range of numbers (typically 99th percentile minimum and maximum) and divides them into a finite number of levels or bin, assigning each bin an identifier. In 8-bit scalar quantization, there are 2^8, or 256, possible bins.
224-
225-
Each component of the vector is mapped to the closest representative value within this set of quantization levels in a process akin to rounding a real number to the nearest integer. In the quantized 8-bit vector, the identifier number stands in place of the original value. After quantization, each vector is represented by an array of identifiers for the bins to which its components belong. These quantized vectors require much fewer bits to store compared to the original vector, thus reducing storage requirements and memory footprint.
226-
227-
## How binary quantization works in Azure AI Search
228-
229-
Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
230-
231-
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found binary quantization performs very well when embeddings are centered around zero. Most popular embedding models such as OpenAI, Cohere, and Mistral are centered around zero.
232-
233233
## Query a quantized vector field using oversampling
234234

235235
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling and rescoring. You can add an `oversampling` parameter to invoke oversampling and rescoring at query time.

0 commit comments

Comments
 (0)