You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-configure-compression-storage.md
+4-6Lines changed: 4 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,11 +21,9 @@ This article enumerates all of optimization techniques in Azure AI Search that c
21
21
22
22
Vector optimization settings are specified in vector field definitions in a search index. Most of the features described in this article are generally available in [2024-07-01 REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-07-01&preserve-view=true) and in the Azure SDK packages targeting that version. The [latest preview version](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-09-01-preview&preserve-view=true) adds support for truncated dimensions if you're using text-embedding-3-large or text-embedding-3-small for vectorization.
23
23
24
-
[An example](#example-vector-compression-techniques) at the end of this article shows the variations in vector size for each of the approaches described in this article.
25
-
26
24
## Evaluate the options
27
25
28
-
Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](#example-vector-compression-techniques).
26
+
Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](vector-search-how-to-quantization.md#example-vector-compression-techniques).
29
27
30
28
We recommend built-in quantization because it compresses vector size in memory *and* on disk with minimal effort, and that tends to provide the most benefit in most scenarios. In contrast, narrow types (except for float16) require a special effort into making them, and `stored` saves on disk storage, which isn't as expensive as memory.
31
29
@@ -385,7 +383,7 @@ The following example shows the fields collection of a search index. Set `stored
385
383
386
384
- Defaults are `stored` set to true and `retrievable` set to false. In a default configuration, a retrievable copy is stored, but it's not automatically returned in results. When `stored` is true, you can toggle `retrievable` between true and false at any time without having to rebuild an index. When `stored` is false, `retrievable` must be false and can't be changed. -->
387
385
388
-
## Example: vector compression techniques
386
+
<!--## Example: vector compression techniques
389
387
390
388
Here's Python code that demonstrates quantization, narrow data types, and use of the stored property: [Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md).
391
389
@@ -420,7 +418,7 @@ Search APIs report storage and vector size at the index level, so indexes and no
420
418
421
419
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling or reranking with original vectors.
422
420
423
-
Recall that the [vector compression definition](#add-compressions-to-a-search-index) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
421
+
Recall that the [vector compression definition](vector-search-how-to-quantization.md) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
424
422
425
423
You can set the oversampling parameter even if the index doesn't explicitly have a `rerankWithOriginalVectors` or `defaultOversampling` definition. Providing `oversampling` at query time overrides the index settings for that query and executes the query with an effective `rerankWithOriginalVectors` as true.
426
424
@@ -446,7 +444,7 @@ POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?ap
446
444
447
445
- Applies to vector fields that undergo vector compression, per the vector profile assignment.
448
446
449
-
- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
447
+
- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.-->
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-quantization.md
+72-7Lines changed: 72 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ ms.topic: how-to
10
10
ms.date: 11/04/2024
11
11
---
12
12
13
-
##Use scalar or binary quantization to compress vector size
13
+
# Use scalar or binary quantization to compress vector size
14
14
15
15
Quantization is recommended for reducing vector size because it lowers both memory and disk storage requirements for float16 and float32 embeddings. To offset the effects of a smaller index, you can add oversampling and reranking over uncompressed vectors.
16
16
@@ -32,9 +32,9 @@ To use built-in quantization, follow these steps:
32
32
> - Create a new vector profile that uses the named configuration
33
33
> - Create a new vector field having the new vector profile
34
34
> - Load the index with float32 or float16 data that's quantized during indexing with the configuration you defined
35
-
> - Optionally, [query quantized data](#query-a-quantized-vector-field-using-oversampling) using the oversampling parameter if you want to override the default
35
+
> - Optionally, [query quantized data](#) using the oversampling parameter if you want to override the default
36
36
37
-
###Add "compressions" to a search index
37
+
## Add "compressions" to a search index
38
38
39
39
The following example shows a partial index definition with a fields collection that includes a vector field, and a `vectorSearch.compressions` section.
40
40
@@ -84,7 +84,7 @@ POST https://[servicename].search.windows.net/indexes?api-version=2024-07-01
84
84
85
85
-`quantizedDataType` is optional and applies to scalar quantization only. If you add it, it must be set to `int8`. This is the only primitive data type supported for scalar quantization at this time. Default is `int8`.
86
86
87
-
###Add the HNSW algorithm
87
+
## Add the HNSW algorithm
88
88
89
89
Make sure your index has the Hierarchical Navigable Small Worlds (HNSW) algorithm. Built-in quantization isn't supported with exhaustive KNN.
90
90
@@ -107,7 +107,7 @@ Make sure your index has the Hierarchical Navigable Small Worlds (HNSW) algorith
107
107
}
108
108
```
109
109
110
-
###Create and assign a new vector profile
110
+
## Create and assign a new vector profile
111
111
112
112
To use a new quantization configuration, you must create a *new* vector profile. Creation of a new vector profile is necessary for building compressed indexes in memory. Your new profile uses HNSW.
113
113
@@ -151,14 +151,79 @@ To use a new quantization configuration, you must create a *new* vector profile.
151
151
152
152
1.[Load the index](search-what-is-data-import.md) using indexers for pull model indexing, or APIs for push model indexing.
153
153
154
-
###How scalar quantization works in Azure AI Search
154
+
## How scalar quantization works in Azure AI Search
155
155
156
156
Scalar quantization reduces the resolution of each number within each vector embedding. Instead of describing each number as a 16-bit or 32-bit floating point number, it uses an 8-bit integer. It identifies a range of numbers (typically 99th percentile minimum and maximum) and divides them into a finite number of levels or bin, assigning each bin an identifier. In 8-bit scalar quantization, there are 2^8, or 256, possible bins.
157
157
158
158
Each component of the vector is mapped to the closest representative value within this set of quantization levels in a process akin to rounding a real number to the nearest integer. In the quantized 8-bit vector, the identifier number stands in place of the original value. After quantization, each vector is represented by an array of identifiers for the bins to which its components belong. These quantized vectors require much fewer bits to store compared to the original vector, thus reducing storage requirements and memory footprint.
159
159
160
-
###How binary quantization works in Azure AI Search
160
+
## How binary quantization works in Azure AI Search
161
161
162
162
Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
163
163
164
164
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found BQ performs very well when embeddings are centered around zero. Most popular embedding models such as OpenAI, Cohere, and Mistral are centered around zero.
165
+
166
+
## Example: vector compression techniques
167
+
168
+
Here's Python code that demonstrates quantization, [narrow data types](vector-search-how-to-assign-narrow-data-types.md), and use of the [stored property](vector-search-how-to-storage-options.md).
169
+
170
+
This code is borrowed from [Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md).
171
+
172
+
This code creates and compares storage and vector index size for each option.
173
+
174
+
```bash
175
+
****************************************
176
+
Index Name: compressiontest-baseline
177
+
Storage Size: 21.3613MB
178
+
Vector Size: 4.8277MB
179
+
****************************************
180
+
Index Name: compressiontest-compression
181
+
Storage Size: 17.7604MB
182
+
Vector Size: 1.2242MB
183
+
****************************************
184
+
Index Name: compressiontest-narrow
185
+
Storage Size: 16.5567MB
186
+
Vector Size: 2.4254MB
187
+
****************************************
188
+
Index Name: compressiontest-no-stored
189
+
Storage Size: 10.9224MB
190
+
Vector Size: 4.8277MB
191
+
****************************************
192
+
Index Name: compressiontest-all-options
193
+
Storage Size: 4.9192MB
194
+
Vector Size: 1.2242MB
195
+
```
196
+
197
+
Search APIs report storage and vector size at the index level, so indexes and not fields must be the basis of comparison. Use the [GET Index Statistics](/rest/api/searchservice/indexes/get-statistics) or an equivalent API in the Azure SDKs to obtain vector size.
198
+
199
+
## Query a quantized vector field using oversampling
200
+
201
+
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling or reranking with original vectors.
202
+
203
+
Recall that the [vector compression definition](vector-search-how-to-quantization.md) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
204
+
205
+
You can set the oversampling parameter even if the index doesn't explicitly have a `rerankWithOriginalVectors` or `defaultOversampling` definition. Providing `oversampling` at query time overrides the index settings for that query and executes the query with an effective `rerankWithOriginalVectors` as true.
206
+
207
+
```http
208
+
POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?api-version=2024-07-01
209
+
Content-Type: application/json
210
+
api-key: [admin key]
211
+
212
+
{
213
+
"vectorQueries": [
214
+
{
215
+
"kind": "vector",
216
+
"vector": [8, 2, 3, 4, 3, 5, 2, 1],
217
+
"fields": "myvector",
218
+
"oversampling": 12.0,
219
+
"k": 5
220
+
}
221
+
]
222
+
}
223
+
```
224
+
225
+
**Key points**:
226
+
227
+
- Applies to vector fields that undergo vector compression, per the vector profile assignment.
228
+
229
+
- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
0 commit comments