Skip to content

Commit 8d0104c

Browse files
committed
Fixed broken links and warnings
1 parent 4ffe340 commit 8d0104c

File tree

2 files changed

+76
-13
lines changed

2 files changed

+76
-13
lines changed

articles/search/vector-search-how-to-configure-compression-storage.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,9 @@ This article enumerates all of optimization techniques in Azure AI Search that c
2121

2222
Vector optimization settings are specified in vector field definitions in a search index. Most of the features described in this article are generally available in [2024-07-01 REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-07-01&preserve-view=true) and in the Azure SDK packages targeting that version. The [latest preview version](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-09-01-preview&preserve-view=true) adds support for truncated dimensions if you're using text-embedding-3-large or text-embedding-3-small for vectorization.
2323

24-
[An example](#example-vector-compression-techniques) at the end of this article shows the variations in vector size for each of the approaches described in this article.
25-
2624
## Evaluate the options
2725

28-
Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](#example-vector-compression-techniques).
26+
Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](vector-search-how-to-quantization.md#example-vector-compression-techniques).
2927

3028
We recommend built-in quantization because it compresses vector size in memory *and* on disk with minimal effort, and that tends to provide the most benefit in most scenarios. In contrast, narrow types (except for float16) require a special effort into making them, and `stored` saves on disk storage, which isn't as expensive as memory.
3129

@@ -385,7 +383,7 @@ The following example shows the fields collection of a search index. Set `stored
385383
386384
- Defaults are `stored` set to true and `retrievable` set to false. In a default configuration, a retrievable copy is stored, but it's not automatically returned in results. When `stored` is true, you can toggle `retrievable` between true and false at any time without having to rebuild an index. When `stored` is false, `retrievable` must be false and can't be changed. -->
387385

388-
## Example: vector compression techniques
386+
<!-- ## Example: vector compression techniques
389387
390388
Here's Python code that demonstrates quantization, narrow data types, and use of the stored property: [Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md).
391389
@@ -420,7 +418,7 @@ Search APIs report storage and vector size at the index level, so indexes and no
420418
421419
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling or reranking with original vectors.
422420
423-
Recall that the [vector compression definition](#add-compressions-to-a-search-index) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
421+
Recall that the [vector compression definition](vector-search-how-to-quantization.md) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
424422
425423
You can set the oversampling parameter even if the index doesn't explicitly have a `rerankWithOriginalVectors` or `defaultOversampling` definition. Providing `oversampling` at query time overrides the index settings for that query and executes the query with an effective `rerankWithOriginalVectors` as true.
426424
@@ -446,7 +444,7 @@ POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?ap
446444
447445
- Applies to vector fields that undergo vector compression, per the vector profile assignment.
448446
449-
- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
447+
- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options. -->
450448

451449
## See also
452450

articles/search/vector-search-how-to-quantization.md

Lines changed: 72 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.topic: how-to
1010
ms.date: 11/04/2024
1111
---
1212

13-
## Use scalar or binary quantization to compress vector size
13+
# Use scalar or binary quantization to compress vector size
1414

1515
Quantization is recommended for reducing vector size because it lowers both memory and disk storage requirements for float16 and float32 embeddings. To offset the effects of a smaller index, you can add oversampling and reranking over uncompressed vectors.
1616

@@ -32,9 +32,9 @@ To use built-in quantization, follow these steps:
3232
> - Create a new vector profile that uses the named configuration
3333
> - Create a new vector field having the new vector profile
3434
> - Load the index with float32 or float16 data that's quantized during indexing with the configuration you defined
35-
> - Optionally, [query quantized data](#query-a-quantized-vector-field-using-oversampling) using the oversampling parameter if you want to override the default
35+
> - Optionally, [query quantized data](#) using the oversampling parameter if you want to override the default
3636
37-
### Add "compressions" to a search index
37+
## Add "compressions" to a search index
3838

3939
The following example shows a partial index definition with a fields collection that includes a vector field, and a `vectorSearch.compressions` section.
4040

@@ -84,7 +84,7 @@ POST https://[servicename].search.windows.net/indexes?api-version=2024-07-01
8484

8585
- `quantizedDataType` is optional and applies to scalar quantization only. If you add it, it must be set to `int8`. This is the only primitive data type supported for scalar quantization at this time. Default is `int8`.
8686

87-
### Add the HNSW algorithm
87+
## Add the HNSW algorithm
8888

8989
Make sure your index has the Hierarchical Navigable Small Worlds (HNSW) algorithm. Built-in quantization isn't supported with exhaustive KNN.
9090

@@ -107,7 +107,7 @@ Make sure your index has the Hierarchical Navigable Small Worlds (HNSW) algorith
107107
}
108108
```
109109

110-
### Create and assign a new vector profile
110+
## Create and assign a new vector profile
111111

112112
To use a new quantization configuration, you must create a *new* vector profile. Creation of a new vector profile is necessary for building compressed indexes in memory. Your new profile uses HNSW.
113113

@@ -151,14 +151,79 @@ To use a new quantization configuration, you must create a *new* vector profile.
151151

152152
1. [Load the index](search-what-is-data-import.md) using indexers for pull model indexing, or APIs for push model indexing.
153153

154-
### How scalar quantization works in Azure AI Search
154+
## How scalar quantization works in Azure AI Search
155155

156156
Scalar quantization reduces the resolution of each number within each vector embedding. Instead of describing each number as a 16-bit or 32-bit floating point number, it uses an 8-bit integer. It identifies a range of numbers (typically 99th percentile minimum and maximum) and divides them into a finite number of levels or bin, assigning each bin an identifier. In 8-bit scalar quantization, there are 2^8, or 256, possible bins.
157157

158158
Each component of the vector is mapped to the closest representative value within this set of quantization levels in a process akin to rounding a real number to the nearest integer. In the quantized 8-bit vector, the identifier number stands in place of the original value. After quantization, each vector is represented by an array of identifiers for the bins to which its components belong. These quantized vectors require much fewer bits to store compared to the original vector, thus reducing storage requirements and memory footprint.
159159

160-
### How binary quantization works in Azure AI Search
160+
## How binary quantization works in Azure AI Search
161161

162162
Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
163163

164164
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found BQ performs very well when embeddings are centered around zero. Most popular embedding models such as OpenAI, Cohere, and Mistral are centered around zero.
165+
166+
## Example: vector compression techniques
167+
168+
Here's Python code that demonstrates quantization, [narrow data types](vector-search-how-to-assign-narrow-data-types.md), and use of the [stored property](vector-search-how-to-storage-options.md).
169+
170+
This code is borrowed from [Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md).
171+
172+
This code creates and compares storage and vector index size for each option.
173+
174+
```bash
175+
****************************************
176+
Index Name: compressiontest-baseline
177+
Storage Size: 21.3613MB
178+
Vector Size: 4.8277MB
179+
****************************************
180+
Index Name: compressiontest-compression
181+
Storage Size: 17.7604MB
182+
Vector Size: 1.2242MB
183+
****************************************
184+
Index Name: compressiontest-narrow
185+
Storage Size: 16.5567MB
186+
Vector Size: 2.4254MB
187+
****************************************
188+
Index Name: compressiontest-no-stored
189+
Storage Size: 10.9224MB
190+
Vector Size: 4.8277MB
191+
****************************************
192+
Index Name: compressiontest-all-options
193+
Storage Size: 4.9192MB
194+
Vector Size: 1.2242MB
195+
```
196+
197+
Search APIs report storage and vector size at the index level, so indexes and not fields must be the basis of comparison. Use the [GET Index Statistics](/rest/api/searchservice/indexes/get-statistics) or an equivalent API in the Azure SDKs to obtain vector size.
198+
199+
## Query a quantized vector field using oversampling
200+
201+
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling or reranking with original vectors.
202+
203+
Recall that the [vector compression definition](vector-search-how-to-quantization.md) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
204+
205+
You can set the oversampling parameter even if the index doesn't explicitly have a `rerankWithOriginalVectors` or `defaultOversampling` definition. Providing `oversampling` at query time overrides the index settings for that query and executes the query with an effective `rerankWithOriginalVectors` as true.
206+
207+
```http
208+
POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?api-version=2024-07-01  
209+
  Content-Type: application/json  
210+
  api-key: [admin key]  
211+
212+
{   
213+
"vectorQueries": [
214+
{   
215+
    "kind": "vector",   
216+
    "vector": [8, 2, 3, 4, 3, 5, 2, 1],   
217+
    "fields": "myvector",
218+
    "oversampling": 12.0,
219+
    "k": 5  
220+
}
221+
]   
222+
}
223+
```
224+
225+
**Key points**:
226+
227+
- Applies to vector fields that undergo vector compression, per the vector profile assignment.
228+
229+
- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.

0 commit comments

Comments
 (0)