You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-assign-narrow-data-types.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,18 +9,18 @@ ms.service: azure-ai-search
9
9
ms.custom:
10
10
- ignite-2024
11
11
ms.topic: how-to
12
-
ms.date: 11/19/2024
12
+
ms.date: 06/12/2025
13
13
---
14
14
15
15
# Assign narrow data types to vector fields in Azure AI Search
16
16
17
-
An easy way to reduce vector size is to store embeddings in a smaller data format. Most embedding models output 32-bit floating point numbers, but if you quantize your vectors, or if your embedding model supports it natively, output might be float16, int16, or int8, which is significantly smaller than float32. You can accommodate these smaller vector sizes by assigning a narrow data type to a vector field. In the vector index, narrow data types consume less storage.
17
+
An easy way to reduce vector size is to store embeddings in a smaller data format. Most embedding models output 32-bit floating point numbers. However, if you quantize your vectors or use an embedding model that natively supports quantization, the output might be float16, int16, or int8, which are significantly smaller than float32. You can accommodate these smaller vector sizes by assigning a narrow data type to a vector field. In the vector index, narrow data types consume less storage.
18
18
19
19
Data types are assigned to fields in an index definition. You can use the Azure portal, the [Search REST APIs](/rest/api/searchservice/indexes/create), or an Azure SDK package that provides the feature.
20
20
21
21
## Prerequisites
22
22
23
-
- An embedding model that output small data formats, such as text-embedding-3 or Cohere V3 embedding models.
23
+
- An embedding model that outputs small data formats, such as text-embedding-3 or Cohere V3 embedding models.
24
24
25
25
## Supported narrow data types
26
26
@@ -32,18 +32,18 @@ Data types are assigned to fields in an index definition. You can use the Azure
32
32
-`Collection(Edm.SByte)` 8-bit signed integer (narrow)
33
33
-`Collection(Edm.Byte)` 8-bit unsigned integer (only allowed with packed binary data types)
34
34
35
-
1. From that list, determine which data type is valid for your embedding model's output, or for vectors that undergo custom quantization.
35
+
1. From that list, determine which data type is valid for your embedding model's output or for vectors that undergo custom quantization.
36
36
37
37
The following table provides links to several embedding models that can use a narrow data type (`Collection(Edm.Half)`) without extra quantization. You can cast from float32 to float16 (using `Collection(Edm.Half)`) with no extra work.
38
38
39
-
| Embedding model | Native output | Assign this type in Azure AI Search |
39
+
| Embedding model | Native output | Assign this type in Azure AI Search |
|[text-embedding-ada-002](/azure/ai-services/openai/concepts/models#embeddings)|`Float32`|`Collection(Edm.Single)` or `Collection(Edm.Half)`|
42
42
|[text-embedding-3-small](/azure/ai-services/openai/concepts/models#embeddings)|`Float32`|`Collection(Edm.Single)` or `Collection(Edm.Half)`|
43
43
|[text-embedding-3-large](/azure/ai-services/openai/concepts/models#embeddings)|`Float32`|`Collection(Edm.Single)` or `Collection(Edm.Half)`|
44
44
|[Cohere V3 embedding models with int8 embedding_type](https://docs.cohere.com/reference/embed)|`Int8`|`Collection(Edm.SByte)`|
45
45
46
-
Other narrow data types can be used if your model emits embeddings in the smaller data format, or if you have custom quantization that converts vectors to a smaller format.
46
+
You can use other narrow data types if your model emits embeddings in the smaller data format or if you have custom quantization that converts vectors to a smaller format.
47
47
48
48
1. Make sure you understand the tradeoffs of a narrow data type. `Collection(Edm.Half)` has less information, which results in lower resolution. If your data is homogenous or dense, losing extra detail or nuance could lead to unacceptable results at query time because there's less detail that can be used to distinguish nearby vectors apart.
49
49
@@ -84,7 +84,7 @@ Data types are assigned on new fields when they're created. You can't change the
84
84
85
85
1. Verify the field content matches the data type. Assuming the vector field is marked as `retrievable`, use [Search explorer](search-explorer.md) or [Search - POST](/rest/api/searchservice/documents/search-post?) to return vector field content.
86
86
87
-
1. To check vector index size, refer to the vector index size column on the **Search management > Indexes** page in the [Azure portal](https://portal.azure.com) or use the [GET Statistics (REST API)](/rest/api/searchservice/indexes/get-statistics) or equivalent Azure SDK method to get the size.
87
+
1. To check vector index size, refer to the vector index size column on the **Search management > Indexes** page in the [Azure portal](https://portal.azure.com). Alternatively, you can use [GET Index Statistics (REST API)](/rest/api/searchservice/indexes/get-statistics) or an equivalent Azure SDK method.
88
88
89
89
> [!NOTE]
90
-
> The field's data type is used to create the physical data structure. If you want to change a data type later, either [drop and rebuild the index](search-howto-reindex.md), or create a second field with the new definition.
90
+
> The field's data type is used to create the physical data structure. If you want to change a data type later, either [drop and rebuild the index](search-howto-reindex.md) or create a second field with the new definition.
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-configure-compression-storage.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,52 +8,52 @@ ms.author: haileytapia
8
8
ms.service: azure-ai-search
9
9
ms.custom:
10
10
- ignite-2024
11
-
ms.topic: concept-article
12
-
ms.date: 11/19/2024
11
+
ms.topic: how-to
12
+
ms.date: 06/12/2025
13
13
---
14
14
15
15
# Choose an approach for optimizing vector storage and processing
16
16
17
17
Embeddings, or the numerical representation of heterogeneous content, are the basis of vector search workloads, but the sizes of embeddings make them hard to scale and expensive to process. Significant research and productization have produced multiple solutions for improving scale and reducing processing times. Azure AI Search taps into a number these capabilities for faster and cheaper vector workloads.
18
18
19
-
This article enumerates all of optimization techniques in Azure AI Search that can help you reduce vector size and query processing times.
19
+
This article covers all of the optimization techniques in Azure AI Search that can help you reduce vector size and query processing times.
20
20
21
-
Vector optimization settings are specified in vector field definitions in a search index. Most of the features described in this article are generally available in [2024-07-01 REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-07-01&preserve-view=true) and in the Azure SDK packages targeting that version. The [latest preview version](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-09-01-preview&preserve-view=true) adds support for truncated dimensions if you're using text-embedding-3-large or text-embedding-3-small for vectorization.
21
+
Vector optimization settings are specified in vector field definitions in a search index. Most of the features described in this article are generally available in the [2024-07-01 REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-07-01&preserve-view=true) and Azure SDK packages targeting that version. The [latest preview version](/rest/api/searchservice/search-service-api-versions#preview-versions) adds support for truncated dimensions if you're using text-embedding-3-large or text-embedding-3-small for vectorization.
22
22
23
23
## Evaluate the options
24
24
25
25
Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](#example-vector-size-by-vector-compression-technique).
26
26
27
-
We recommend built-in quantization because it compresses vector size in memory *and* on disk with minimal effort, and that tends to provide the most benefit in most scenarios. In contrast, narrow types (except for float16) require a special effort into making them, and `stored` saves on disk storage, which isn't as expensive as memory.
27
+
We recommend built-in quantization because it compresses vector size in memory *and* on disk with minimal effort, which tends to provide the most benefit in most scenarios. In contrast, narrow types (except for float16) require special effort to create them, and `stored` saves on disk storage, which isn't as expensive as memory.
28
28
29
-
| Approach | Why use this option|
29
+
| Approach | Why use this approach|
30
30
|----------|---------------------|
31
-
|[Add scalar or binary quantization](vector-search-how-to-quantization.md)|Use quantization to compress native float32 or float16 embeddings to int8 (scalar) or Byte (binary). This option reduces storage in memory and on disk with no degradation of query performance. Smaller data types like int8 or Byte produce vector indexes that are less content-rich than those with larger embeddings. To offset information loss, built-in compression includes options for post-query processing using uncompressed embeddings and oversampling to return more relevant results. Reranking and oversampling are specific features of built-in quantization of float32 or float16 fields and can't be used on embeddings that undergo custom quantization. |
32
-
|[Truncate dimensions for MRL-capable text-embedding-3 models (preview)](vector-search-how-to-truncate-dimensions.md)|Exercise the option to use fewer dimensions on text-embedding-3 models. On Azure OpenAI, these models have been retrained on the [Matryoshka Representation Learning (MRL)](https://arxiv.org/abs/2205.13147) technique that produces multiple vector representations at different levels of compression. This approach produces faster searches and reduced storage costs, with minimal loss of semantic information. In Azure AI Search, MRL support supplements scalar and binary quantization. When you use either quantization method, you can also specify a `truncateDimension` property on your vector fields to reduce the dimensionality of text embeddings. |
33
-
|[Assign smaller primitive data types to vector fields](vector-search-how-to-assign-narrow-data-types.md)| Narrow data types, such as float16, int16, int8, and Byte (binary) consume less space in memory and on disk, but you must have an embedding model that outputs vectors in a narrow data format. Or, you must have custom quantization logic that outputs small data. A third use case that requires less effort is recasting native float32 embeddings produced by most models to float16. See [Index binary vectors](vector-search-how-to-index-binary-data.md) for details about binary vectors. |
31
+
|[Add scalar or binary quantization](vector-search-how-to-quantization.md)|Compress native float32 or float16 embeddings to int8 (scalar) or byte (binary). This option reduces storage in memory and on disk with no degradation of query performance. Smaller data types, such as int8 or byte, produce vector indexes that are less content-rich than those with larger embeddings. To offset information loss, built-in compression includes options for post-query processing using uncompressed embeddings and oversampling to return more relevant results. Reranking and oversampling are specific features of built-in quantization of float32 or float16 fields and can't be used on embeddings that undergo custom quantization. |
32
+
|[Truncate dimensions for MRL-capable text-embedding-3 models (preview)](vector-search-how-to-truncate-dimensions.md)|Use fewer dimensions on text-embedding-3 models. On Azure OpenAI, these models are retrained on the [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147)(MRL) technique that produces multiple vector representations at different levels of compression. This approach produces faster searches and reduced storage costs with minimal loss of semantic information. In Azure AI Search, MRL support supplements scalar and binary quantization. When you use either quantization method, you can also specify a `truncateDimension` property on your vector fields to reduce the dimensionality of text embeddings. |
33
+
|[Assign smaller primitive data types to vector fields](vector-search-how-to-assign-narrow-data-types.md)| Narrow data types, such as float16, int16, int8, and byte (binary), consume less space in memory and on disk, but you must have an embedding model that outputs vectors in a narrow data format. Alternatively, you must have custom quantization logic that outputs small data. A third use case that requires less effort is recasting native float32 embeddings produced by most models to float16. For information about binary vectors, see [Index binary vectors](vector-search-how-to-index-binary-data.md). |
34
34
|[Eliminate optional storage of retrievable vectors](vector-search-how-to-storage-options.md)| Vectors returned in a query response are stored separately from vectors used during query execution. If you don't need to return vectors, you can turn off retrievable storage, reducing overall per-field disk storage by up to 50 percent. |
35
35
36
36
All of these options are defined on an empty index. To implement any of them, use the Azure portal, REST APIs, or an Azure SDK package targeting that API version.
37
37
38
38
After the index is defined, you can load and index documents as a separate step.
39
39
40
-
## Example: vector size by vector compression technique
40
+
## Example: Vector size by vector compression technique
41
41
42
42
[Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md) is a Python code sample that creates multiple search indexes that vary by their use of vector storage quantization, [narrow data types](vector-search-how-to-assign-narrow-data-types.md), and [storage properties](vector-search-how-to-storage-options.md).
43
43
44
44
This code creates and compares storage and vector index size for each vector storage optimization option. From these results, you can see that [quantization](vector-search-how-to-quantization.md) reduces vector size the most, but the greatest storage savings are achieved if you use multiple options.
Search APIs report storage and vector size at the index level, so indexes and not fields must be the basis of comparison. Use the [GET Index Statistics](/rest/api/searchservice/indexes/get-statistics) or an equivalent API in the Azure SDKs to obtain vector size.
54
+
Search APIs report storage and vector size at the index level, so indexes and not fields must be the basis of comparison. Use [GET Index Statistics](/rest/api/searchservice/indexes/get-statistics) or an equivalent API in the Azure SDKs to obtain vector size.
55
55
56
-
## See also
56
+
## Related content
57
57
58
58
-[Get started with REST](search-get-started-rest.md)
59
59
-[Supported data types](/rest/api/searchservice/supported-data-types)
0 commit comments