Skip to content

Commit ff281b5

Browse files
committed
More compression doc updates
1 parent d29c1fc commit ff281b5

File tree

3 files changed

+78
-6
lines changed

3 files changed

+78
-6
lines changed

articles/search/search-api-migration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.custom:
1111
- ignite-2023
1212
- build-2024
1313
ms.topic: conceptual
14-
ms.date: 11/05/2024
14+
ms.date: 11/19/2024
1515
---
1616

1717
# Upgrade to the latest REST API in Azure AI Search
@@ -77,7 +77,7 @@ See [Migrate from preview version](semantic-how-to-configure.md#migrate-from-pre
7777

7878
[`2024-11-01-preview`](/rest/api/searchservice/search-service-api-versions#2024-11-01-preview) query rewrite, Document Layout skill, keyless billing for skills processing, Markdown parsing mode, and rescoring options for compressed vectors.
7979

80-
If you're upgrading from `2024-09-01-preview`, you can use the new preview APIs with no change to existing code.
80+
If you're upgrading from `2024-09-01-preview`, you can use the new preview APIs with no change to existing code. However, the new version introduces change to vectorSearch.compressions, replacing `rerankWithOriginalVectors` with `enableRescoring`, and moving `defaultOversampling` to a new `rescoringOptions` property object. For a comparison of the syntax, see [Compress vectors using scalar or binary quantization](vector-search-how-to-quantization.md#add-compressions-to-a-search-index).
8181

8282
## Upgrade to 2024-09-01-preview
8383

articles/search/vector-search-how-to-quantization.md

Lines changed: 74 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,13 @@ Two types of quantization are supported:
4444

4545
The following example shows a partial index definition with a fields collection that includes a vector field, and a `vectorSearch.compressions` section.
4646

47-
This example includes both `scalarQuantization` or `binaryQuantization`. You can specify as many compression configurations as you need, and then assign the ones you want to a vector profile.
47+
It includes both `scalarQuantization` or `binaryQuantization`. You can specify as many compression configurations as you need, and then assign the ones you want to a vector profile.
48+
49+
Syntax for `vectorSearch.Compressions` varies between stable and preview REST APIs, with the preview adding new options for storage optimization, plus changes to existing syntax. Backwards compatibility is preserved through internal API mappings, but you should adopt the new syntax in code that targets 2024-11-01-preview and future versions.
50+
51+
### [**2024-07-01**](#tab/2024-07-01)
52+
53+
Use the [Create Index](/rest/api/searchservice/indexes/create) or [Create or Update Index](/rest/api/searchservice/indexes/create-or-update) REST API to configure compression settings.
4854

4955
```http
5056
POST https://[servicename].search.windows.net/indexes?api-version=2024-07-01
@@ -84,12 +90,78 @@ POST https://[servicename].search.windows.net/indexes?api-version=2024-07-01
8490

8591
- `kind` must be set to `scalarQuantization` or `binaryQuantization`.
8692

87-
- `rerankWithOriginalVectors` uses the original, uncompressed vectors to recalculate similarity and rerank the top results returned by the initial search query. The uncompressed vectors exist in the search index even if `stored` is false. This property is optional. Default is true.
93+
- `rerankWithOriginalVectors` uses the original uncompressed vectors to recalculate similarity and rerank the top results returned by the initial search query. The uncompressed vectors exist in the search index even if `stored` is false. This property is optional. Default is true.
8894

8995
- `defaultOversampling` considers a broader set of potential results to offset the reduction in information from quantization. The formula for potential results consists of the `k` in the query, with an oversampling multiplier. For example, if the query specifies a `k` of 5, and oversampling is 20, then the query effectively requests 100 documents for use in reranking, using the original uncompressed vector for that purpose. Only the top `k` reranked results are returned. This property is optional. Default is 4.
9096

9197
- `quantizedDataType` is optional and applies to scalar quantization only. If you add it, it must be set to `int8`. This is the only primitive data type supported for scalar quantization at this time. Default is `int8`.
9298

99+
### [**2024-11-01-preview**](#tab/2024-11-01-preview)
100+
101+
Use the [Create Index (preview)](/rest/api/searchservice/indexes/create?view=rest-searchservice-2024-11-01-preview&preserve-view=true) or [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-11-01-preview&preserve-view=true) REST API to configure compression settings.
102+
103+
Changes in this version include new `rescoringOptions` that replace `rerankWithOriginalVectors`, and extend the API with more storage options. Notice that `defaultOversampling` is now a property of `rescoringOptions`.
104+
105+
Rescoring options are used to mitigate the effects of lossy comprehension. You can set `rescoringOptions` for scalar or binary quantization.
106+
107+
```http
108+
POST https://[servicename].search.windows.net/indexes?api-version=2024-11-01-preview
109+
110+
{
111+
"name": "my-index",
112+
"fields": [
113+
{ "name": "Id", "type": "Edm.String", "key": true, "retrievable": true, "searchable": true, "filterable": true },
114+
{ "name": "content", "type": "Edm.String", "retrievable": true, "searchable": true },
115+
{ "name": "vectorContent", "type": "Collection(Edm.Single)", "retrievable": false, "searchable": true, "dimensions": 1536,"vectorSearchProfile": "vector-profile-1"},
116+
],
117+
"vectorSearch": {
118+
"profiles": [ ],
119+
"algorithms": [ ],
120+
"compressions": [
121+
{
122+
"name": "use-scalar",
123+
"kind": "scalarQuantization",
124+
"rescoringOptions": {
125+
"enableRescoring": true,
126+
"defaultOversampling": 10,
127+
"rescoreStorageMethod": "preserveOriginals"
128+
},
129+
"scalarQuantizationParameters": {
130+
"quantizedDataType": "int8"
131+
},
132+
"truncationDimension": 1024
133+
},
134+
{
135+
"name": "use-binary",
136+
"kind": "binaryQuantization",
137+
"rescoringOptions": {
138+
"enableRescoring": true,
139+
"defaultOversampling": 10,
140+
"rescoreStorageMethod": "preserveOriginals"
141+
},
142+
"truncationDimension": 1024
143+
}
144+
]
145+
}
146+
}
147+
```
148+
149+
**Key points**:
150+
151+
- `kind` must be set to `scalarQuantization` or `binaryQuantization`.
152+
153+
- `rescoringOptions` are a collection of properties used to offset lossy compression by rescoring query results using the original full-precision vectors that exist prior to quantization. For rescoring to work, you must have the vector instance that provides this content. Setting `rescoreStorageMethod` to `discardOriginals` prevents you from using `enableRescoring` or `defaultOversampling`. For more information about vector storage, see [Eliminate optional vector instances from storage](vector-search-how-to-storage-options.md).
154+
155+
- `"enableRescoring": "preserveOriginals"` is the API equivalent of `"rerankWithOriginalVectors": true`. Rescoring vector search results with the original full-precision vectors can result in adjustments to search score and rankings, promoting the more relevant matches as determined by the rescoring step.
156+
157+
- `defaultOversampling` considers a broader set of potential results to offset the reduction in information from quantization. The formula for potential results consists of the `k` in the query, with an oversampling multiplier. For example, if the query specifies a `k` of 5, and oversampling is 20, then the query effectively requests 100 documents for use in reranking, using the original uncompressed vector for that purpose. Only the top `k` reranked results are returned. This property is optional. Default is 4.
158+
159+
- `quantizedDataType` is optional and applies to scalar quantization only. If you add it, it must be set to `int8`. This is the only primitive data type supported for scalar quantization at this time. Default is `int8`.
160+
161+
- `truncationDimension` is a preview feature that taps inherent capabilities of the text-embedding-3 models to "encode information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks" (see [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147)). You can use truncated dimensions with or without rescoring options. For more information about how this feature is implemented in Azure AI Search, see [Truncate dimensions using MRL compression](vector-search-how-to-truncate-dimensions.md).
162+
163+
---
164+
93165
## Add the HNSW algorithm
94166

95167
Make sure your index has the Hierarchical Navigable Small Worlds (HNSW) algorithm. Built-in quantization isn't supported with exhaustive KNN.

articles/search/vector-search-how-to-truncate-dimensions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.date: 11/19/2024
1717
1818
Exercise the ability to use fewer dimensions on text-embedding-3 models. On Azure OpenAI, text-embedding-3 models are retrained on the [Matryoshka Representation Learning (MRL)](https://arxiv.org/abs/2205.13147) technique that produces multiple vector representations at different levels of compression. This approach produces faster searches and reduced storage costs, with minimal loss of semantic information.
1919

20-
In Azure AI Search, MRL support supplements [scalar and binary quantization](vector-search-how-to-quantization.md). When you use either quantization method, you can also specify a `truncateDimension` property on your vector fields to reduce the dimensionality of text embeddings.
20+
In Azure AI Search, MRL support supplements [scalar and binary quantization](vector-search-how-to-quantization.md). When you use either quantization method, you can also specify a `truncationDimension` property on your vector fields to reduce the dimensionality of text embeddings.
2121

2222
MRL multilevel compression saves on vector storage and improves query response times for vector queries based on text embeddings. In Azure AI Search, MRL support is only offered together with another method of quantization. Using binary quantization with MRL provides the maximum vector index size reduction. To achieve maximum storage reduction, use binary quantization with MRL, and `stored` set to false.
2323

@@ -31,7 +31,7 @@ This feature is in preview. It's available in `2024-09-01-preview` and in beta S
3131

3232
- [Hierarchical Navigable Small World (HNSW)algorithm](vector-search-ranking.md) (no support for exhaustive KNN in this preview).
3333

34-
- [Scalar or binary quantization](vector-search-how-to-quantization.md). We recommend binary quantization for MRL compression.
34+
- [Scalar or binary quantization](vector-search-how-to-quantization.md). Truncated dimensions can be set only when scalar or binary quantization is configured. We recommend binary quantization for MRL compression.
3535

3636
## Supported clients
3737

0 commit comments

Comments
 (0)