You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-index-binary-data.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: Index binary data for vector search
2
+
title: Index binary vectors for vector search
3
3
titleSuffix: Azure AI Search
4
-
description: Explains how to configure fields for binary data and the vector search configuration for querying the fields.
4
+
description: Explains how to configure fields for binary vectors and the vector search configuration for querying the fields.
5
5
6
6
author: HeidiSteen
7
7
ms.author: heidist
@@ -12,11 +12,11 @@ ms.topic: how-to
12
12
ms.date: 05/21/2024
13
13
---
14
14
15
-
# Index binary data for vector search
15
+
# Index binary vectors for vector search
16
16
17
-
Beginning with the 2024-05-01-preview REST API, Azure AI Search supports a packed binary type of `Collection(Edm.Binary)` for further reducing the storage and memory footprint of vector data. You can use this data type for output from models such as [Cohere's Embed v3 binary embedding models](https://cohere.com/blog/introducing-embed-v3).
17
+
Beginning with the 2024-05-01-preview REST API, Azure AI Search supports a packed binary type of `Collection(Edm.Byte)` for further reducing the storage and memory footprint of vector data. You can use this data type for output from models such as [Cohere's Embed v3 binary embedding models](https://cohere.com/blog/introducing-embed-v3).
18
18
19
-
There are three steps to configuring an index for binary data:
19
+
There are three steps to configuring an index for binary vectors:
20
20
21
21
> [!div class="checklist"]
22
22
> + Add a vector search algorithm that specifies Hamming distance for binary vector comparison
@@ -27,7 +27,7 @@ This article assumes you're familiar with [creating an index in Azure AI Search]
27
27
28
28
## Prerequisites
29
29
30
-
+An embedding model that outputs embeddings in a packed form, where each 8-bit binary value is packed into one uint8 unit.
30
+
+Binary vectors, with 1 bit per dimension, packaged in uint8 values with 8 bits per value. These can be obtained by using models that directly generate "packaged binary" vectors, or by quantizing vectors into binary vectors client-side during indexing and searching.
31
31
32
32
## Limitations
33
33
@@ -37,7 +37,7 @@ This article assumes you're familiar with [creating an index in Azure AI Search]
37
37
38
38
## Add a vector search algorithm and vector profile
39
39
40
-
Vector search algorithms are used to create the query navigation structures during indexing. For binary data fields, vector comparisons are performed using the Hamming distance metric.
40
+
Vector search algorithms are used to create the query navigation structures during indexing. For binary vector fields, vector comparisons are performed using the Hamming distance metric.
41
41
42
42
1. To add a binary field to an index, set up a [`Create or Update Index`](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true) request using the **2024-05-01-preview REST API** or the Azure portal.
Copy file name to clipboardExpand all lines: articles/search/whats-new.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,13 +26,14 @@ ms.custom:
26
26
|[Higher capacity and more vector quota at every tier (same billing rate)](search-limits-quotas-capacity.md#service-limits)| Infrastructure | Partition sizes are now even larger for Standard 2 (S2), Standard 3 (S3), and Standard 3 High Density (S3 HD) for all services created after April 3, 2024. If you create a new service now, you get the larger partitions. If you created a new service between April 3 and May 17, you get the larger partitions automatically. <br><br>Storage Optimized tiers (L1 and L2) also have more capacity. L1 and L2 customers must create a new service to benefit from the higher capacity. There's no in-place upgrade at this time. <br><br>Extra capacity is now available in [more regions](search-limits-quotas-capacity.md#supported-regions-with-higher-storage-limits): South Africa North, Germany North, Germany West Central, Switzerland West, East US 2 EUAP/PPE, and Azure Government (Texas, Arizona, and Virginia).|
27
27
|[OneLake files and shortcuts integration (preview)](search-how-to-index-onelake-files.md)| Feature | New indexer for OneLake files and OneLake shortcuts. If you use Microsoft Fabric and OneLake for data access to Amazon Web Services (AWS) and Google data sources, use this indexer to import external data into a search index. This indexer is available through the Azure portal, the [2024-05-01-preview REST API](/rest/api/searchservice/data-sources/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true), and Azure SDK beta packages. |
28
28
|[Relevance tuning and search results customization](vector-search-how-to-query.md)| Feature | Three enhancements improve vector search relevance. <br><br>First, you can now set thresholds on vector search results to exclude low-scoring results. <br><br>Second, you can set `MaxSizeTextRecall` and `countAndFacetMode` in hybrid queries to specify the maximum number of documents that can be recalled using text query in hybrid (text and vector) search. Previously, the maximum was fixed at 1,000. If you have more matches, you can now specify a higher limit to get more results back. <br><br>Third, for hybrid queries, you can set a weight on vector queries to have more or less importance than the nonvector query. |
29
-
|[Binary data support](/rest/api/searchservice/supported-data-types)| Feature |`Collection(Edm.Byte)` is a new supported data type. This data type opens up integration with the [Cohere v3 binary embedding models](https://cohere.com/blog/int8-binary-embeddings) and custom binary quantization. Narrow data types lower the cost of large vector datasets. See [Index binary data for vector search](vector-search-how-to-index-binary-data.md) for more information.|
29
+
|[Binary vectors support](/rest/api/searchservice/supported-data-types)| Feature |`Collection(Edm.Byte)` is a new supported data type. This data type opens up integration with the [Cohere v3 binary embedding models](https://cohere.com/blog/int8-binary-embeddings) and custom binary quantization. Narrow data types lower the cost of large vector datasets. See [Index binary data for vector search](vector-search-how-to-index-binary-data.md) for more information.|
30
30
|[Azure AI Vision multimodal embeddings skill (preview)](cognitive-search-skill-vision-vectorize.md)| Skill | New skill that's bound to the [multimodal embeddings API of Azure AI Vision](../ai-services/computer-vision/concept-image-retrieval.md). You can generate embeddings for text or images during indexing. This skill is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true).|
31
31
|[Azure AI Vision vectorizer (preview)](vector-search-vectorizer-ai-services-vision.md)| Vectorizer | New vectorizer connects to an Azure AI Vision resource using the [multimodal embeddings API](../ai-services/computer-vision/concept-image-retrieval.md) to generate embeddings at query time. This vectorizer is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true). |
32
32
|[Azure AI Studio model catalog vectorizer (preview)](vector-search-vectorizer-azure-machine-learning-ai-studio-catalog.md)| Vectorizer | New vectorizer connects to an embedding model deployed from the [Azure AI Studio model catalog](../ai-studio/how-to/model-catalog.md). This vectorizer is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true). <br><br>[**How to implement integrated vectorization using models from Azure AI Studio**](vector-search-integrated-vectorization-ai-studio.md).|
33
33
|[AzureOpenAIEmbedding skill (preview) supports more models on Azure OpenAI](cognitive-search-skill-azure-openai-embedding.md)| Skill | Updates to this skill add support for more embedding models on Azure OpenAI. New `dimensions` and `modelName` properties are used for specifying models. Previously, the dimensions limits were fixed at 1,536 dimensions. It's now configurable. This update is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true).|
34
34
|[2024-05-01-preview Search REST API](/rest/api/searchservice/search-service-api-versions#2024-05-01-preview)| API | New preview version of the Search REST APIs provides new skills and vectorizers, new binary data type, OneLake files indexer, and new query parameters for more relevant results. See [Upgrade REST APIs](search-api-migration.md) if you have existing code written against the 2023-07-01-preview and need to migrate to this version.|
35
35
| Azure SDK beta packages for new features | API | Review the changelogs of the following Azure SDK beta packages for new feature support: [Azure SDK for Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md), [Azure SDK for .NET](https://github.com/Azure/azure-sdk-for-net/blob/Azure.Search.Documents_11.6.0-beta.4/sdk/search/Azure.Search.Documents/CHANGELOG.md), [Azure SDK for Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/search/azure-search-documents/CHANGELOG.md)|
36
+
|[Python code samples](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/readme.md)| Samples | New end-to-end samples demonstrate [integration with Cohere Embed v3](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/community-integration/cohere/azure-search-cohere-embed-v3-sample.ipynb), [integration with OneLake and cloud data platforms on Google and AWS](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo.ipynb), and [integration with Azure AI Vision multimodal APIs](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/embeddings/multimodal-embeddings/multimodal-embeddings.ipynb). |
36
37
<!-- | Network security perimeter support (preview) | Feature | A network security perimeter is a new service that provides a secure perimeter for communication, and controlled access to resources outside of the perimeter. Azure AI Search is one of the eight Azure services that can run within a network security perimeter. This feature is provided by the [2024-03-01-preview Management REST API](/rest/api/searchmanagement/operation-groups?view=rest-searchmanagement-2024-03-01-preview&preserve-view=true) and the Azure portal. | -->
0 commit comments