Skip to content

Commit a1d2917

Browse files
authored
Merge pull request #1245 from HeidiSteen/heidist-hnsw
[azure search] Refactor vector compression content
2 parents 4f68641 + 1df100b commit a1d2917

16 files changed

+581
-49
lines changed

articles/search/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ landingContent:
4545
- text: Built-in vectorization
4646
url: vector-search-integrated-vectorization.md
4747
- text: Built-in compression
48-
url: vector-search-how-to-configure-compression-storage.md
48+
url: vector-search-how-to-quantization.md
4949
- text: Retrieval Augmented Generation (RAG)
5050
url: retrieval-augmented-generation-overview.md
5151
- linkListType: quickstart

articles/search/search-api-preview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: azure-ai-search
1010
ms.custom:
1111
- build-2024
1212
ms.topic: conceptual
13-
ms.date: 10/01/2024
13+
ms.date: 11/01/2024
1414
---
1515

1616
# Preview features in Azure AI Search
@@ -25,7 +25,7 @@ Preview features are removed from this list if they're retired or transition to
2525

2626
|Feature                         | Category | Description | Availability |
2727
|---------|------------------|-------------|---------------|
28-
| [**Lower the dimension requirements for MRL-trained text embedding models on Azure OpenAI**](vector-search-how-to-configure-compression-storage.md#use-mrl-compression-and-truncated-dimensions-preview) | Feature | Text-embedding-3-small and Text-embedding-3-large are trained using Matryoshka Representation Learning (MRL). This allows you to truncate the embedding vectors to fewer dimensions, and adjust the balance between vector index size usage and retrieval quality. A new `truncationDimension` provides the MRL behaviors as an additional parameter in a vector compression configuration. This can only be configured for new vector fields. | [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-09-01-preview&preserve-view=true). |
28+
| [**Lower the dimension requirements for MRL-trained text embedding models on Azure OpenAI**](vector-search-how-to-truncate-dimensions.md) | Feature | Text-embedding-3-small and Text-embedding-3-large are trained using Matryoshka Representation Learning (MRL). This allows you to truncate the embedding vectors to fewer dimensions, and adjust the balance between vector index size usage and retrieval quality. A new `truncationDimension` provides the MRL behaviors as an additional parameter in a vector compression configuration. This can only be configured for new vector fields. | [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-09-01-preview&preserve-view=true). |
2929
| [**Unpack `@search.score` to view subscores in hybrid search results**](hybrid-search-ranking.md#unpack-a-search-score-into-subscores-preview) | Feature | You can investigate Reciprocal Rank Fusion (RRF) ranked results by viewing the individual query subscores of the final merged and scored result. A new `debug` property unpacks the search score. `QueryResultDocumentSubscores`, `QueryResultDocumentRerankerInput`, and `QueryResultDocumentSemanticField` provide the extra detail. | [Search Documents (preview)](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-09-01-preview&preserve-view=true). |
3030
| [**Target filters in a hybrid search to just the vector queries**](hybrid-search-how-to-query.md#hybrid-search-with-filters-targeting-vector-subqueries-preview) | Feature | A filter on a hybrid query involves all subqueries on the request, regardless of type. You can override the global filter to scope the filter to a specific subquery. A new `filterOverride` parameter provides the behaviors. | [Search Documents (preview)](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-09-01-preview&preserve-view=true). |
3131
| [**Text Split skill (token chunking)**](cognitive-search-skill-textsplit.md) | Applied AI (skills) | This skill has new parameters that improve data chunking for embedding models. A new `unit` parameter lets you specify token chunking. You can now chunk by token length, setting the length to a value that makes sense for your embedding model. You can also specify the tokenizer and any tokens that shouldn't be split during data chunking. | [Create or Update Skillset (preview)](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2024-09-01-preview&preserve-view=true). |
@@ -51,7 +51,7 @@ Preview features are removed from this list if they're retired or transition to
5151

5252
|Feature                         | Category | Description | Availability |
5353
|---------|------------------|-------------|---------------|
54-
| [**Add Azure AI Search to a network security perimiter**](search-security-network-security-perimiter.md) | Service | Join a search service to a [network security perimeter](/azure/private-link/network-security-perimeter-concepts) to control network access to your search service. | The Azure portal and the [Network Security Perimiter APIs 2024-06-01-preview](/rest/api/searchmanagement/network-security-perimeter-configurations?view=rest-searchmanagement-2024-06-01-preview&preserve-view=true). |
54+
| [**Add Azure AI Search to a network security perimeter**](search-security-network-security-perimiter.md) | Service | Join a search service to a [network security perimeter](/azure/private-link/network-security-perimeter-concepts) to control network access to your search service. | The Azure portal and the [Network Security Perimeter APIs 2024-06-01-preview](/rest/api/searchmanagement/network-security-perimeter-configurations?view=rest-searchmanagement-2024-06-01-preview&preserve-view=true). |
5555
| [**Search service under a user-assigned managed identity**](search-howto-managed-identities-data-sources.md) | Service | Configures a search service to use a previously created user-assigned managed identity. | [Services - Update](/rest/api/searchmanagement/services/update?view=rest-searchmanagement-2024-06-01-preview&preserve-view=true#identity), 2021-04-01-preview or the latest preview version. We recommend using the latest preview version. |
5656

5757
## Preview features in Azure SDKs

articles/search/search-capacity-planning.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ For billing rates per tier and currency, see the [Azure AI Search pricing page](
152152

153153
## Estimate capacity using a billable tier
154154

155-
Storage needs are determined by the size of the indexes you expect to build. There are no solid heuristics or generalities that help with estimates. The only way to determine the size of an index is [build one](search-what-is-an-index.md). Its size is based on tokenization and embeddings, and whether you enable suggesters, filtering, and sorting, or can take advantage of [vector compression](vector-search-how-to-configure-compression-storage.md).
155+
Storage needs are determined by the size of the indexes you expect to build. There are no solid heuristics or generalities that help with estimates. The only way to determine the size of an index is [build one](search-what-is-an-index.md). Its size is based on tokenization and embeddings, and whether you enable suggesters, filtering, and sorting, or can take advantage of [vector compression](vector-search-how-to-quantization.md).
156156

157157
We recommend estimating on a billable tier, Basic or above. The Free tier runs on physical resources shared by multiple customers and is subject to factors beyond your control. Only the dedicated resources of a billable search service can accommodate larger sampling and processing times for more realistic estimates of index quantity, size, and query volumes during development.
158158

@@ -172,7 +172,7 @@ We recommend estimating on a billable tier, Basic or above. The Free tier runs o
172172

173173
+ For keyword search, marking fields as filterable and sortable [increases index size](search-what-is-an-index.md#example-demonstrating-the-storage-implications-of-attributes-and-suggesters).
174174

175-
+ For vector search, you can [set parameters to reduce storage](vector-search-how-to-configure-compression-storage.md).
175+
+ For vector search, you can [set parameters to reduce vector size](vector-search-how-to-configure-compression-storage.md).
176176

177177
1. [Monitor storage, service limits, query volume, and latency](monitor-azure-cognitive-search.md) in the portal. The portal shows you queries per second, throttled queries, and search latency. All of these values can help you decide if you selected the right tier.
178178

articles/search/search-features-list.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ There's feature parity in all Azure public, private, and sovereign clouds, but s
4040
| Vector filters | [Apply filters before or after query execution](vector-search-filters.md) for greater precision during information retrieval. |
4141
| Hybrid information retrieval | Search for concepts and keywords in a single [hybrid query request](hybrid-search-how-to-query.md). </p>[**Hybrid search**](hybrid-search-overview.md) consolidates vector and text search, with optional semantic ranking and relevance tuning for best results.|
4242
| Integrated data chunking and vectorization | Native data chunking through [Text Split skill](cognitive-search-skill-textsplit.md). Native vectorization through [vectorizers](vector-search-how-to-configure-vectorizer.md) and embedding skills such as [AzureOpenAIEmbeddingModel](cognitive-search-skill-azure-openai-embedding.md), [Azure AI Vision multimodal](cognitive-search-skill-vision-vectorize.md), and the [AML skill](cognitive-search-aml-skill.md) that you can use to connect to endpoints in the Azure AI Studio model catalog. </p>[**Integrated vectorization**](vector-search-integrated-vectorization.md) provides an end-to-end indexing pipeline from source files to queries.|
43-
| Integrated vector compression and quantization | Use [built-in scalar and binary quantization](vector-search-how-to-configure-compression-storage.md) to reduce vector index size in memory and on disk. You can also forego storage of vectors you don't need, or assign narrow data types to vector fields for reduced storage requirements. |
43+
| Integrated vector compression and quantization | Use [built-in scalar and binary quantization](vector-search-how-to-quantization.md) to reduce vector index size in memory and on disk. You can also forego storage of vectors you don't need, or assign narrow data types to vector fields for reduced storage requirements. |
4444

4545
## Applied AI and knowledge mining
4646

articles/search/search-howto-reindex.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Queries continue to run, but if you're updating or removing existing fields, you
4747

4848
+ The payload must include the keys or identifiers of every document you want to add, update, or delete.
4949

50-
+ If your index includes vector fields and you set the [`stored` property to false](vector-search-how-to-configure-compression-storage.md#option-3-set-the-stored-property-to-remove-retrievable-storage), make sure you provide the vector in your partial document update, even if the value is unchanged. A side effect of setting `stored` to false is that vectors are dropped on a reindexing operation. Providing the vector in the documents payload prevents this from happening.
50+
+ If your index includes vector fields and you set the [`stored` property to false](vector-search-how-to-storage-options.md), make sure you provide the vector in your partial document update, even if the value is unchanged. A side effect of setting `stored` to false is that vectors are dropped on a reindexing operation. Providing the vector in the documents payload prevents this from happening.
5151

5252
+ To update the contents of simple fields and subfields in complex types, list only the fields you want to change. For example, if you only need to update a description field, the payload should consist of the document key and the modified description. Omitting other fields retains their existing values.
5353

articles/search/search-what-is-azure-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ When you create a search service, you work with the following capabilities:
2424
+ A search engine for [vector search](vector-search-overview.md) and [full text](search-lucene-query-architecture.md) and [hybrid search](hybrid-search-overview.md) over a search index
2525
+ Rich indexing with [integrated data chunking and vectorization](vector-search-integrated-vectorization.md), [lexical analysis](search-analyzers.md) for text, and [optional applied AI](cognitive-search-concept-intro.md) for content extraction and transformation
2626
+ Rich query syntax for [vector queries](vector-search-how-to-query.md), text search, [hybrid queries](hybrid-search-how-to-query.md), fuzzy search, autocomplete, geo-search and others
27-
+ Relevance and query performance tuning with [semantic ranking](semantic-search-overview.md), [scoring profiles](index-add-scoring-profiles.md), [quantization for vector queries](vector-search-how-to-configure-compression-storage.md), and parameters for controlling query behaviors at runtime
27+
+ Relevance and query performance tuning with [semantic ranking](semantic-search-overview.md), [scoring profiles](index-add-scoring-profiles.md), [quantization for vector queries](vector-search-how-to-quantization.md), and parameters for controlling query behaviors at runtime
2828
+ Azure scale, security, and reach
2929
+ Azure integration at the data layer, machine learning layer, Azure AI services and Azure OpenAI
3030

articles/search/toc.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -349,10 +349,18 @@ items:
349349
items:
350350
- name: Understand vector quotas and limits
351351
href: vector-search-index-size.md
352-
- name: Compress vector index size
352+
- name: Choose a vector optimization strategy
353353
href: vector-search-how-to-configure-compression-storage.md
354+
- name: Use binary or scalar quantization
355+
href: vector-search-how-to-quantization.md
354356
- name: Index binary data for vector search
355357
href: vector-search-how-to-index-binary-data.md
358+
- name: Assign narrow data types
359+
href: vector-search-how-to-assign-narrow-data-types.md
360+
- name: Eliminate redundant storage
361+
href: vector-search-how-to-storage-options.md
362+
- name: Truncate dimensions (preview)
363+
href: vector-search-how-to-truncate-dimensions.md
356364
- name: Query vectors
357365
href: vector-search-how-to-query.md
358366
- name: Add a vectorizer for text-to-vector queries
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: Assign narrow data types
3+
titleSuffix: Azure AI Search
4+
description: In vector search, assign narrow data types to vector fields to reduce the storage requirements of vector indexes.
5+
6+
author: heidisteen
7+
ms.author: heidist
8+
ms.service: azure-ai-search
9+
ms.topic: how-to
10+
ms.date: 11/04/2024
11+
---
12+
13+
# Assign narrow data types
14+
15+
An easy way to reduce vector size is to store embeddings in a smaller data format. Most embedding models output 32-bit floating point numbers, but if you quantize your vectors, or if your embedding model supports it natively, output might be float16, int16, or int8, which is significantly smaller than float32. You can accommodate these smaller vector sizes by assigning a narrow data type to a vector field. In the vector index, narrow data types consume less storage.
16+
17+
Data types are assigned to fields in an index definition. You can use the Azure portal, the [Search REST APIs](/rest/api/searchservice/indexes/create), or an Azure SDK package that provides the feature.
18+
19+
## Prerequisites
20+
21+
- An embedding model that output small data formats.
22+
23+
## Supported narrow data types
24+
25+
1. Review the [data types used for vector fields](/rest/api/searchservice/supported-data-types#edm-data-types-for-vector-fields) for recommended usage:
26+
27+
- `Collection(Edm.Single)` 32-bit floating point (default)
28+
- `Collection(Edm.Half)` 16-bit floating point (narrow)
29+
- `Collection(Edm.Int16)` 16-bit signed integer (narrow)
30+
- `Collection(Edm.SByte)` 8-bit signed integer (narrow)
31+
- `Collection(Edm.Byte)` 8-bit unsigned integer (only allowed with packed binary data types)
32+
33+
1. From that list, determine which data type is valid for your embedding model's output, or for vectors that undergo custom quantization.
34+
35+
The following table provides links to several embedding models that can use a narrow data type (`Collection(Edm.Half)`) without extra quantization. You can cast from float32 to float16 (using `Collection(Edm.Half)`) with no extra work.
36+
37+
| Embedding model | Native output | Assign this type in Azure AI Search |
38+
|------------------------|---------------|--------------------------------|
39+
| [text-embedding-ada-002](/azure/ai-services/openai/concepts/models#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
40+
| [text-embedding-3-small](/azure/ai-services/openai/concepts/models#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
41+
| [text-embedding-3-large](/azure/ai-services/openai/concepts/models#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
42+
| [Cohere V3 embedding models with int8 embedding_type](https://docs.cohere.com/reference/embed) | `Int8` | `Collection(Edm.SByte)` |
43+
44+
Other narrow data types can be used if your model emits embeddings in the smaller data format, or if you have custom quantization that converts vectors to a smaller format.
45+
46+
1. Make sure you understand the tradeoffs of a narrow data type. `Collection(Edm.Half)` has less information, which results in lower resolution. If your data is homogenous or dense, losing extra detail or nuance could lead to unacceptable results at query time because there's less detail that can be used to distinguish nearby vectors apart.
47+
48+
## Assign the data type
49+
50+
[Define and build the index](vector-search-how-to-create-index.md). You can use the Azure portal, [Create or Update Index (REST API)](/rest/api/searchservice/indexes/create-or-update), or an Azure SDK package for this step.
51+
52+
This field definition uses a narrow data type, `Collection(Edm.Half)`, that can accept a float32 embedding stored as a float16 value. As is true for all vector fields, `dimensions` and `vectorSearchProfile` are set. The specifics of the `vectorSearchProfile` are immaterial to the datatype.
53+
54+
We recommend that you set `retrievable` and `stored` to true if you want to visually check the values of the field. On a subsequent rebuild, you can change these properties to false for reduced storage requirements.
55+
56+
```json
57+
{
58+
"name": "nameEmbedding",
59+
"type": "Collection(Edm.Half)",
60+
"searchable": true,
61+
"filterable": false,
62+
"retrievable": true,
63+
"sortable": false,
64+
"facetable": false,
65+
"key": false,
66+
"indexAnalyzer": null,
67+
"searchAnalyzer": null,
68+
"analyzer": null,
69+
"synonymMaps": [],
70+
"dimensions": 1536,
71+
"vectorSearchProfile": "myHnswProfile"
72+
}
73+
```
74+
75+
Recall that vector fields aren't filterable, sortable, or facetable. They can't be used as keys and don't use analyzers or synonym maps.
76+
77+
### Working with a production index
78+
79+
Data types are assigned on new fields when they're created. You can't change the data type of an existing field, and you can't drop a field without [rebuilding the index](search-howto-reindex.md). For established indexes already in production, it's common to work around this issue by creating new fields with the desired revisions and then removing obsolete fields during a planned index rebuild.
80+
81+
## Check results
82+
83+
1. Verify the field content matches the data type. Assuming the vector field is marked as retrievable, use [Search explorer](search-explorer.md) or [Search - POST](/rest/api/searchservice/documents/search-post?) to return vector field content.
84+
85+
1. To check vector index size, refer to the vector index size column on the Indexes page in the Azure portal or use the [GET Statistics (REST API)](/rest/api/searchservice/indexes/get-statistics) or equivalent Azure SDK method to get the size.
86+
87+
<!--
88+
Evidence of choosing the wrong data type, for example choosing `int8` for a `float32` embedding, is a field that's indexed as an array of zeros. If you encounter this problem, start over. -->
89+
90+
> [!NOTE]
91+
> The field's data type is used to create the physical data structure. If you want to change a data type later, either drop and rebuild the index, or create a second field with the new definition.

0 commit comments

Comments
 (0)