You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-agentic-retrieval-how-to-index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -169,7 +169,7 @@ All `searchable` fields are included in query execution. There's no support for
169
169
170
170
## Add a description
171
171
172
-
An index `description` field is exposed programmatically, which means you can pass this description to LLMs and Model Context Protocol (MCP) servers as an input when deciding to use a specific index for a query. This human-readable text is invaluable when a system must access several indexes and make a decision based on the description.
172
+
An index `description` field is a user-defined string that you can use to provide guidance to LLMs and Model Context Protocol (MCP) servers when deciding to use a specific index for a query. This human-readable text is invaluable when a system must access several indexes and make a decision based on the description.
173
173
174
174
An index description is a schema update, and you can add it without having to rebuild the entire index.
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-quantization.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,18 +48,30 @@ Two types of quantization are supported:
48
48
>[!Note]
49
49
> While free services support quantization, they don't demonstrate the full storage savings due to the limited storage quota.
50
50
51
+
### How scalar quantization works in Azure AI Search
52
+
53
+
Scalar quantization reduces the resolution of each number within each vector embedding. Instead of describing each number as a 16-bit or 32-bit floating point number, it uses an 8-bit integer. It identifies a range of numbers (typically 99th percentile minimum and maximum) and divides them into a finite number of levels or bin, assigning each bin an identifier. In 8-bit scalar quantization, there are 2^8, or 256, possible bins.
54
+
55
+
Each component of the vector is mapped to the closest representative value within this set of quantization levels in a process akin to rounding a real number to the nearest integer. In the quantized 8-bit vector, the identifier number stands in place of the original value. After quantization, each vector is represented by an array of identifiers for the bins to which its components belong. These quantized vectors require much fewer bits to store compared to the original vector, thus reducing storage requirements and memory footprint.
56
+
57
+
### How binary quantization works in Azure AI Search
58
+
59
+
Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
60
+
61
+
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found binary quantization performs very well when embeddings are centered around zero. Most popular embedding models offered by OpenAI, Cohere, and Mistral are centered around zero.
62
+
51
63
## Supported rescoring techniques
52
64
53
-
Rescoring is an optional technique used to offset information loss due to vector compression. It uses oversampling to pick up extra vectors, and supplemental information to rescore initial results found by the query. Supplemental information is either uncompressed original full-precision vectors - or for binary quantization only - you have the option of rescoring using the binary quantized document candidates against the query vector.
65
+
Rescoring is an optional technique used to offset information loss due to vector quantization. During query execution, it uses oversampling to pick up extra vectors, and supplemental information to rescore initial results found by the query. Supplemental information is either uncompressed original full-precision vectors - or for binary quantization only - you have the option of rescoring using the binary quantized document candidates against the query vector.
54
66
55
-
Only HNSW graphs allow rescoring. Exhaustive KNN doesn't support rescoring.
67
+
Only HNSW graphs allow rescoring. Exhaustive KNN doesn't support rescoring because by definition, all vectors are scanned at query time, which makes oversampling irrelevant.
56
68
57
-
Rescoring options are specified in the index, but you can invoke rescoring at query time if the index supports it.
69
+
Rescoring options are specified in the index, but you can invoke rescoring at query time by adding the oversampling query parameter.
58
70
59
71
| Object | Properties |
60
72
|--------|------------|
61
-
| Index |[`RescoringOptions`](/rest/api/searchservice/indexes/create-or-update#rescoringoptions)with these properties: `rescoringOptions.enableRescoring``rescoringOptions.defaultOversampling`, `rescoringOptions.escoreStorageMethod`|
62
-
| Query |`oversampling` on [`RawVectorQuery`](/rest/api/searchservice/documents/search-post#rawvectorquery)and[`VectorizableTextQuery`](/rest/api/searchservice/documents/search-post#vectorizabletextquery)|
73
+
| Index |Add [`RescoringOptions`](/rest/api/searchservice/indexes/create-or-update#rescoringoptions)to the vector compressions section: `rescoringOptions.enableRescoring`(true or false), `rescoringOptions.defaultOversampling` (an integer), `rescoringOptions.rescoreStorageMethod` (preserveOriginals or discardOriginals). We recommend preserveOriginals for scalar quantization and discardOriginals for binary quantization.|
74
+
| Query |Add `oversampling` on [`RawVectorQuery`](/rest/api/searchservice/documents/search-post#rawvectorquery)or[`VectorizableTextQuery`](/rest/api/searchservice/documents/search-post#vectorizabletextquery) definitions.|
63
75
64
76
> [!NOTE]
65
77
> Rescoring parameter names have changed over the last several releases. If you're using an older preview API, review the [upgrade instructions](search-api-migration.md#upgrade-to-2024-11-01-preview) for addressing breaking changes.
@@ -143,7 +155,7 @@ POST https://[servicename].search.windows.net/indexes?api-version=2025-09-01
143
155
144
156
-`rescoringOptions` are a collection of properties used to offset lossy compression by rescoring query results using the original full-precision vectors that exist prior to quantization. For rescoring to work, you must have the vector instance that provides this content. Setting `rescoreStorageMethod` to `discardOriginals` prevents you from using `enableRescoring` or `defaultOversampling`. For more information about vector storage, see [Eliminate optional vector instances from storage](vector-search-how-to-storage-options.md).
145
157
146
-
-`"rescoreStorageMethod": "preserveOriginals"` rescores vector search results with the original full-precision vectors can result in adjustments to search score and rankings, promoting the more relevant matches as determined by the rescoring step. For binary quantization, you can set `rescoreStorageMethod` to `discardOriginals` to further reduce storage, without reducing quality. These aren't needed for binary quantization.
158
+
-`"rescoreStorageMethod": "preserveOriginals"` rescores vector search results with the original full-precision vectors can result in adjustments to search score and rankings, promoting the more relevant matches as determined by the rescoring step. For binary quantization, you can set `rescoreStorageMethod` to `discardOriginals` to further reduce storage, without reducing quality. Original vectors aren't needed for binary quantization.
147
159
148
160
-`defaultOversampling` considers a broader set of potential results to offset the reduction in information from quantization. The formula for potential results consists of the `k` in the query, with an oversampling multiplier. For example, if the query specifies a `k` of 5, and oversampling is 20, then the query effectively requests 100 documents for use in reranking, using the original uncompressed vector for that purpose. Only the top `k` reranked results are returned. This property is optional. Default is 4.
149
161
@@ -218,18 +230,6 @@ To use a new quantization configuration, you must create a *new* vector profile.
218
230
219
231
1.[Load the index](search-what-is-data-import.md) using indexers for pull model indexing, or APIs for push model indexing.
220
232
221
-
## How scalar quantization works in Azure AI Search
222
-
223
-
Scalar quantization reduces the resolution of each number within each vector embedding. Instead of describing each number as a 16-bit or 32-bit floating point number, it uses an 8-bit integer. It identifies a range of numbers (typically 99th percentile minimum and maximum) and divides them into a finite number of levels or bin, assigning each bin an identifier. In 8-bit scalar quantization, there are 2^8, or 256, possible bins.
224
-
225
-
Each component of the vector is mapped to the closest representative value within this set of quantization levels in a process akin to rounding a real number to the nearest integer. In the quantized 8-bit vector, the identifier number stands in place of the original value. After quantization, each vector is represented by an array of identifiers for the bins to which its components belong. These quantized vectors require much fewer bits to store compared to the original vector, thus reducing storage requirements and memory footprint.
226
-
227
-
## How binary quantization works in Azure AI Search
228
-
229
-
Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
230
-
231
-
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found binary quantization performs very well when embeddings are centered around zero. Most popular embedding models such as OpenAI, Cohere, and Mistral are centered around zero.
232
-
233
233
## Query a quantized vector field using oversampling
234
234
235
235
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling and rescoring. You can add an `oversampling` parameter to invoke oversampling and rescoring at query time.
Copy file name to clipboardExpand all lines: articles/search/whats-new.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ Learn about the latest updates to Azure AI Search functionality, docs, and sampl
30
30
|[Normalizers](search-normalizers.md)| Keyword search | Generally available. |
31
31
|[Index description](search-agentic-retrieval-how-to-index.md#add-a-description)| Agentic search | Generally available. |
32
32
|[Rescoring of binary quantized vectors](vector-search-how-to-quantization.md#supported-rescoring-techniques)| Vector search | Generally available. |
33
-
|[Rescoring options for HNSW compressed vectors](vector-search-how-to-quantization.md#add-compressions-to-a-search-index)| Vector search | Generally available. |
33
+
|[Rescoring options for scalar compressed vectors](vector-search-how-to-quantization.md#supported-rescoring-techniques)| Vector search | Generally available. |
34
34
|[Scoring profiles for semantically ranked results](semantic-how-to-enable-scoring-profiles.md)| Relevance | Generally available. |
35
35
|[Truncate dimensions](vector-search-how-to-truncate-dimensions.md)| Vector search | Generally available. |
36
36
|[Unpack `@search.score` to view subscores in hybrid search results](hybrid-search-ranking.md#unpack-a-search-score-into-subscores)| Hybrid search | Generally available. |
0 commit comments