Skip to content

Commit 316af01

Browse files
Merge pull request #273523 from HeidiSteen/heidist-bug
[azure search] Fixed JSON bug
2 parents c97085b + b7d9911 commit 316af01

File tree

1 file changed

+63
-59
lines changed

1 file changed

+63
-59
lines changed

articles/search/vector-search-how-to-configure-compression-storage.md

Lines changed: 63 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: heidisteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 04/03/2024
10+
ms.date: 04/26/2024
1111
---
1212

1313
# Configure vector quantization and reduced storage for smaller vectors in Azure AI Search
@@ -19,14 +19,14 @@ This article describes vector quantization and other techniques for compressing
1919

2020
## Evaluate the options
2121

22-
As a first step, review your options for reducing the amount of storage used by vector fields. These options aren't mutually exclusive so you can use multiple options together.
22+
As a first step, review the three options for reducing the amount of storage used by vector fields. These options aren't mutually exclusive.
2323

24-
We recommend scalar quantization because it's the most effective option for most scenarios. Narrow types (except for `Float16`) require a special effort into making them, and `stored` saves storage, which isn't as expensive as memory.
24+
We recommend scalar quantization because it compresses vector size in memory and on disk with minimal effort, and that tends to provide the most benefit in most scenarios. In contrast, narrow types (except for `Float16`) require a special effort into making them, and `stored` saves on disk storage, which isn't as expensive as memory.
2525

2626
| Approach | Why use this option |
2727
|----------|---------------------|
28-
| Assign smaller primitive data types to vector fields | Narrow data types, such as `Float16`, `Int16`, and `Int8`, consume less space in memory and on disk. This option is viable if your embedding model outputs vectors in a narrow data format. Or, if you have custom quantization logic that outputs small data. A more common use case is recasting the native `Float32` embeddings produced by most models to `Float16`. |
29-
| Eliminate optional storage of retrievable vectors | Vectors returned in a query response are stored separately from vectors used during query execution. If you don't need to return vectors, you can turn off retrievable storage, reducing overall per-field storage by up to 50 percent. |
28+
| Assign smaller primitive data types to vector fields | Narrow data types, such as `Float16`, `Int16`, and `Int8`, consume less space in memory and on disk, but you must have an embedding model that outputs vectors in a narrow data format. Or, you must have custom quantization logic that outputs small data. A third use case that requires less effort is recasting native `Float32` embeddings produced by most models to `Float16`. |
29+
| Eliminate optional storage of retrievable vectors | Vectors returned in a query response are stored separately from vectors used during query execution. If you don't need to return vectors, you can turn off retrievable storage, reducing overall per-field disk storage by up to 50 percent. |
3030
| Add scalar quantization | Use built-in scalar quantization to compress native `Float32` embeddings to `Int8`. This option reduces storage in memory and on disk with no degradation of query performance. Smaller data types like `Int8` produce vector indexes that are less content-rich than those with `Float32` embeddings. To offset information loss, built-in compression includes options for post-query processing using uncompressed embeddings and oversampling to return more relevant results. Reranking and oversampling are specific features of built-in scalar quantization of `Float32` or `Float16` fields and can't be used on embeddings that undergo custom quantization. |
3131

3232
All of these options are defined on an empty index. To implement any of them, use the Azure portal, [2024-03-01-preview REST APIs](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-03-01-preview&preserve-view=true), or a beta Azure SDK package.
@@ -199,7 +199,11 @@ Each component of the vector is mapped to the closest representative value withi
199199

200200
## Example index with vectorCompression, data types, and stored property
201201

202-
Here's a JSON example of a search index that specifies `vectorCompression` on `Float32` field, a `Float16` data type on second vector field, and a `stored` property set to false. It's a composite of the vector compression and storage features in this preview.
202+
Here's a composite example of a search index that specifies narrow data types, reduced storage, and vector compression.
203+
204+
+ "HotelNameVector" provides a narrow data type example, recasting the original `Float32` values to `Float16`, expressed as `Collection(Edm.Half)` in the search index.
205+
+ "HotelNameVector" also has `stored` set to false. Extra embeddings used in a query response are not stored.
206+
+ "DescriptionVector" provides an example of vector compression. Vector compression is defined in the index, referenced in a profile, and then assigned to a vector field. "DescriptionVector" also has `stored` set to false.
203207

204208
```json
205209
### Create a new index
@@ -245,15 +249,15 @@ POST {{baseUrl}}/indexes?api-version=2024-03-01-preview HTTP/1.1
245249
"filterable": false,
246250
"retrievable": false,
247251
"sortable": false,
248-
"facetable": false,
249-
"stored": false,
252+
"facetable": false
250253
},
251254
{
252255
"name": "DescriptionVector",
253256
"type": "Collection(Edm.Single)",
254257
"searchable": true,
255258
"retrievable": true,
256259
"dimensions": 1536,
260+
"stored": false,
257261
"vectorSearchProfile": "my-vector-profile-with-compression"
258262
},
259263
{
@@ -298,63 +302,63 @@ POST {{baseUrl}}/indexes?api-version=2024-03-01-preview HTTP/1.1
298302
"facetable": false
299303
}
300304
],
301-
"vectorSearch": {
302-
    "compressions": [
303-
      { 
304-
        "name": "my-scalar-quantization", 
305-
        "kind": "scalarQuantization", 
306-
        "rerankWithOriginalVectors": true, 
307-
        "defaultOversampling": 10.0, 
308-
        "scalarQuantizationParameters": { 
309-
          "quantizedDataType": "int8",
310-
        }
311-
      }
312-
    ], 
313-
"algorithms": [
314-
{
315-
"name": "my-hnsw-vector-config-1",
316-
"kind": "hnsw",
317-
"hnswParameters":
318-
{
319-
"m": 4,
320-
"efConstruction": 400,
321-
"efSearch": 500,
322-
"metric": "cosine"
323-
}
324-
},
325-
{
326-
"name": "my-hnsw-vector-config-2",
327-
"kind": "hnsw",
328-
"hnswParameters":
329-
{
330-
"m": 4,
331-
"metric": "euclidean"
305+
"vectorSearch": {
306+
"compressions": [
307+
{
308+
"name": "my-scalar-quantization",
309+
"kind": "scalarQuantization",
310+
"rerankWithOriginalVectors": true,
311+
"defaultOversampling": 10.0,
312+
"scalarQuantizationParameters": {
313+
"quantizedDataType": "int8"
332314
}
333-
},
315+
}
316+
],
317+
"algorithms": [
318+
{
319+
"name": "my-hnsw-vector-config-1",
320+
"kind": "hnsw",
321+
"hnswParameters":
334322
{
335-
"name": "my-eknn-vector-config",
336-
"kind": "exhaustiveKnn",
337-
"exhaustiveKnnParameters":
338-
{
339-
"metric": "cosine"
340-
}
323+
"m": 4,
324+
"efConstruction": 400,
325+
"efSearch": 500,
326+
"metric": "cosine"
341327
}
342-
],
343-
"profiles": [
328+
},
329+
{
330+
"name": "my-hnsw-vector-config-2",
331+
"kind": "hnsw",
332+
"hnswParameters":
344333
{
345-
"name": "my-vector-profile-with-compression",
346-
"compression": "my-scalar-quantization",
347-
"algorithm": "my-hnsw-vector-config-1",
348-
"vectorizer": null
349-
},
334+
"m": 4,
335+
"metric": "euclidean"
336+
}
337+
},
338+
{
339+
"name": "my-eknn-vector-config",
340+
"kind": "exhaustiveKnn",
341+
"exhaustiveKnnParameters":
350342
{
351-
"name": "my-vector-profile-no-compression",
352-
"compression": null,
353-
"algorithm": "my-eknn-vector-config",
354-
"vectorizer": null
343+
"metric": "cosine"
355344
}
356-
]
357-
},
345+
}
346+
],
347+
"profiles": [
348+
{
349+
"name": "my-vector-profile-with-compression",
350+
"compression": "my-scalar-quantization",
351+
"algorithm": "my-hnsw-vector-config-1",
352+
"vectorizer": null
353+
},
354+
{
355+
"name": "my-vector-profile-no-compression",
356+
"compression": null,
357+
"algorithm": "my-eknn-vector-config",
358+
"vectorizer": null
359+
}
360+
]
361+
},
358362
"semantic": {
359363
"configurations": [
360364
{

0 commit comments

Comments
 (0)