|
1 | 1 | ---
|
2 |
| -title: Quantize vector fields |
| 2 | +title: Compress vectors using quantization |
3 | 3 | titleSuffix: Azure AI Search
|
4 | 4 | description: Configure built-in scalar or quantization for compressing vectors on disk and in memory.
|
5 | 5 |
|
6 | 6 | author: heidisteen
|
7 | 7 | ms.author: heidist
|
8 | 8 | ms.service: azure-ai-search
|
9 | 9 | ms.topic: how-to
|
10 |
| -ms.date: 11/04/2024 |
| 10 | +ms.date: 11/19/2024 |
11 | 11 | ---
|
12 | 12 |
|
13 |
| -# Use scalar or binary quantization to compress vector size |
| 13 | +# Compress vectors using scalar or binary quantization |
14 | 14 |
|
15 |
| -Quantization is recommended for reducing vector size because it lowers both memory and disk storage requirements for float16 and float32 embeddings. To offset the effects of a smaller index, you can add oversampling and reranking over uncompressed vectors. |
16 |
| - |
17 |
| -Quantization applies to vector fields receiving float-type vectors. In the examples in this article, the field's data type is `Collection(Edm.Single)` for incoming float32 embeddings, but float16 is also supported. When the vectors are received on a field with compression configured, the engine automatically performs quantization to reduce the footprint of the vector data in memory and on disk. |
18 |
| - |
19 |
| -Two types of quantization are supported: |
20 |
| - |
21 |
| -- Scalar quantization compresses float values into narrower data types. AI Search currently supports int8, which is 8 bits, reducing vector index size fourfold. |
22 |
| - |
23 |
| -- Binary quantization converts floats into binary bits, which takes up 1 bit. This results in up to 28 times reduced vector index size. |
| 15 | +Azure AI Search supports scalar and binary quantization for reducing the size of vectors in a search index. Quantization is recommended for reducing vector size because it lowers both memory and disk storage consumption for float16 and float32 embeddings. To offset the effects of a smaller index, you can add oversampling and reranking over uncompressed vectors. |
24 | 16 |
|
25 | 17 | To use built-in quantization, follow these steps:
|
26 | 18 |
|
27 | 19 | > [!div class="checklist"]
|
28 |
| -> - Use [Create Index](/rest/api/searchservice/indexes/create) or [Create Or Update Index](/rest/api/searchservice/indexes/create-or-update) to specify vector compression |
29 |
| -> - Add `vectorSearch.compressions` to a search index |
| 20 | +> - Add [vector fields and a `vectorSearch` configuration](vector-search-how-to-create-index.md) to an index |
| 21 | +> - Add `vectorSearch.compressions` |
30 | 22 | > - Add a `scalarQuantization` or `binaryQuantization` configuration and give it a name
|
31 | 23 | > - Set optional properties to mitigate the effects of lossy indexing
|
32 | 24 | > - Create a new vector profile that uses the named configuration
|
33 | 25 | > - Create a new vector field having the new vector profile
|
34 | 26 | > - Load the index with float32 or float16 data that's quantized during indexing with the configuration you defined
|
35 |
| -> - Optionally, [query quantized data](#) using the oversampling parameter if you want to override the default |
| 27 | +> - Optionally, [query quantized data](#query-a-quantized-vector-field-using-oversampling) using the oversampling parameter if you want to override the default |
| 28 | +
|
| 29 | +## Prerequisites |
| 30 | + |
| 31 | +- [Vector fields in a search index](vector-search-how-to-create-index.md) with a `vectorSearch` configuration, using the HNSW algorithm and a new vector profile. |
| 32 | + |
| 33 | +## Supported quantization techniques |
| 34 | + |
| 35 | +Quantization applies to vector fields receiving float-type vectors. In the examples in this article, the field's data type is `Collection(Edm.Single)` for incoming float32 embeddings, but float16 is also supported. When the vectors are received on a field with compression configured, the engine automatically performs quantization to reduce the footprint of the vector data in memory and on disk. |
| 36 | + |
| 37 | +Two types of quantization are supported: |
| 38 | + |
| 39 | +- Scalar quantization compresses float values into narrower data types. AI Search currently supports int8, which is 8 bits, reducing vector index size fourfold. |
| 40 | + |
| 41 | +- Binary quantization converts floats into binary bits, which takes up 1 bit. This results in up to 28 times reduced vector index size. |
36 | 42 |
|
37 | 43 | ## Add "compressions" to a search index
|
38 | 44 |
|
@@ -76,7 +82,7 @@ POST https://[servicename].search.windows.net/indexes?api-version=2024-07-01
|
76 | 82 |
|
77 | 83 | **Key points**:
|
78 | 84 |
|
79 |
| -- `kind` must be set to `scalarQuantization` or `binaryQuantization` |
| 85 | +- `kind` must be set to `scalarQuantization` or `binaryQuantization`. |
80 | 86 |
|
81 | 87 | - `rerankWithOriginalVectors` uses the original, uncompressed vectors to recalculate similarity and rerank the top results returned by the initial search query. The uncompressed vectors exist in the search index even if `stored` is false. This property is optional. Default is true.
|
82 | 88 |
|
@@ -227,3 +233,17 @@ POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?ap
|
227 | 233 | - Applies to vector fields that undergo vector compression, per the vector profile assignment.
|
228 | 234 |
|
229 | 235 | - Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
|
| 236 | + |
| 237 | +<!-- |
| 238 | +RESCORE WITH ORIGINAL VECTORS -- NEEDS AN H2 or H3 |
| 239 | +It's used to rescore search results obtained used compressed vectors. |
| 240 | +
|
| 241 | +Rescore with original vectors |
| 242 | +After the initial query, rescore results using uncompressed vectors |
| 243 | + |
| 244 | +For "enableRescoring", we provide true or false options. if it's true, the query will first retrieve using compressed vectors, then rescore results using uncompressed vectors. |
| 245 | +
|
| 246 | +Step one: Vector query executes using the compressed vectors. |
| 247 | +Step two: Query returns the top oversampling k-matches. |
| 248 | +Step three: Oversampling k-matches are rescored using the uncompressed vectors, adjusting the scores and ranking so that more relevant matches appear first. |
| 249 | + --> |
0 commit comments