Skip to content

Commit c88cd42

Browse files
Merge pull request #275868 from HeidiSteen/heidist-quik
what's new, link to samples
2 parents 60b9aba + 83845b8 commit c88cd42

File tree

2 files changed

+9
-8
lines changed

2 files changed

+9
-8
lines changed

articles/search/vector-search-how-to-index-binary-data.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: Index binary data for vector search
2+
title: Index binary vectors for vector search
33
titleSuffix: Azure AI Search
4-
description: Explains how to configure fields for binary data and the vector search configuration for querying the fields.
4+
description: Explains how to configure fields for binary vectors and the vector search configuration for querying the fields.
55

66
author: HeidiSteen
77
ms.author: heidist
@@ -12,11 +12,11 @@ ms.topic: how-to
1212
ms.date: 05/21/2024
1313
---
1414

15-
# Index binary data for vector search
15+
# Index binary vectors for vector search
1616

17-
Beginning with the 2024-05-01-preview REST API, Azure AI Search supports a packed binary type of `Collection(Edm.Binary)` for further reducing the storage and memory footprint of vector data. You can use this data type for output from models such as [Cohere's Embed v3 binary embedding models](https://cohere.com/blog/introducing-embed-v3).
17+
Beginning with the 2024-05-01-preview REST API, Azure AI Search supports a packed binary type of `Collection(Edm.Byte)` for further reducing the storage and memory footprint of vector data. You can use this data type for output from models such as [Cohere's Embed v3 binary embedding models](https://cohere.com/blog/introducing-embed-v3).
1818

19-
There are three steps to configuring an index for binary data:
19+
There are three steps to configuring an index for binary vectors:
2020

2121
> [!div class="checklist"]
2222
> + Add a vector search algorithm that specifies Hamming distance for binary vector comparison
@@ -27,7 +27,7 @@ This article assumes you're familiar with [creating an index in Azure AI Search]
2727

2828
## Prerequisites
2929

30-
+ An embedding model that outputs embeddings in a packed form, where each 8-bit binary value is packed into one uint8 unit.
30+
+ Binary vectors, with 1 bit per dimension, packaged in uint8 values with 8 bits per value. These can be obtained by using models that directly generate "packaged binary" vectors, or by quantizing vectors into binary vectors client-side during indexing and searching.
3131

3232
## Limitations
3333

@@ -37,7 +37,7 @@ This article assumes you're familiar with [creating an index in Azure AI Search]
3737

3838
## Add a vector search algorithm and vector profile
3939

40-
Vector search algorithms are used to create the query navigation structures during indexing. For binary data fields, vector comparisons are performed using the Hamming distance metric.
40+
Vector search algorithms are used to create the query navigation structures during indexing. For binary vector fields, vector comparisons are performed using the Hamming distance metric.
4141

4242
1. To add a binary field to an index, set up a [`Create or Update Index`](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true) request using the **2024-05-01-preview REST API** or the Azure portal.
4343

articles/search/whats-new.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,14 @@ ms.custom:
2626
| [Higher capacity and more vector quota at every tier (same billing rate)](search-limits-quotas-capacity.md#service-limits) | Infrastructure | Partition sizes are now even larger for Standard 2 (S2), Standard 3 (S3), and Standard 3 High Density (S3 HD) for all services created after April 3, 2024. If you create a new service now, you get the larger partitions. If you created a new service between April 3 and May 17, you get the larger partitions automatically. <br><br>Storage Optimized tiers (L1 and L2) also have more capacity. L1 and L2 customers must create a new service to benefit from the higher capacity. There's no in-place upgrade at this time. <br><br>Extra capacity is now available in [more regions](search-limits-quotas-capacity.md#supported-regions-with-higher-storage-limits): South Africa North​, Germany North​, Germany West Central​, Switzerland West​, East US 2 EUAP/PPE​, and Azure Government (Texas, Arizona, and Virginia).|
2727
| [OneLake files and shortcuts integration (preview)](search-how-to-index-onelake-files.md) | Feature | New indexer for OneLake files and OneLake shortcuts. If you use Microsoft Fabric and OneLake for data access to Amazon Web Services (AWS) and Google data sources, use this indexer to import external data into a search index. This indexer is available through the Azure portal, the [2024-05-01-preview REST API](/rest/api/searchservice/data-sources/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true), and Azure SDK beta packages. |
2828
| [Relevance tuning and search results customization](vector-search-how-to-query.md) | Feature | Three enhancements improve vector search relevance. <br><br>First, you can now set thresholds on vector search results to exclude low-scoring results. <br><br>Second, you can set `MaxSizeTextRecall` and `countAndFacetMode` in hybrid queries to specify the maximum number of documents that can be recalled using text query in hybrid (text and vector) search. Previously, the maximum was fixed at 1,000. If you have more matches, you can now specify a higher limit to get more results back. <br><br>Third, for hybrid queries, you can set a weight on vector queries to have more or less importance than the nonvector query. |
29-
| [Binary data support](/rest/api/searchservice/supported-data-types) | Feature | `Collection(Edm.Byte)` is a new supported data type. This data type opens up integration with the [Cohere v3 binary embedding models](https://cohere.com/blog/int8-binary-embeddings) and custom binary quantization. Narrow data types lower the cost of large vector datasets. See [Index binary data for vector search](vector-search-how-to-index-binary-data.md) for more information.|
29+
| [Binary vectors support](/rest/api/searchservice/supported-data-types) | Feature | `Collection(Edm.Byte)` is a new supported data type. This data type opens up integration with the [Cohere v3 binary embedding models](https://cohere.com/blog/int8-binary-embeddings) and custom binary quantization. Narrow data types lower the cost of large vector datasets. See [Index binary data for vector search](vector-search-how-to-index-binary-data.md) for more information.|
3030
| [Azure AI Vision multimodal embeddings skill (preview)](cognitive-search-skill-vision-vectorize.md) | Skill | New skill that's bound to the [multimodal embeddings API of Azure AI Vision](../ai-services/computer-vision/concept-image-retrieval.md). You can generate embeddings for text or images during indexing. This skill is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true).|
3131
| [Azure AI Vision vectorizer (preview)](vector-search-vectorizer-ai-services-vision.md) | Vectorizer | New vectorizer connects to an Azure AI Vision resource using the [multimodal embeddings API](../ai-services/computer-vision/concept-image-retrieval.md) to generate embeddings at query time. This vectorizer is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true). |
3232
| [Azure AI Studio model catalog vectorizer (preview)](vector-search-vectorizer-azure-machine-learning-ai-studio-catalog.md) | Vectorizer | New vectorizer connects to an embedding model deployed from the [Azure AI Studio model catalog](../ai-studio/how-to/model-catalog.md). This vectorizer is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true). <br><br>[**How to implement integrated vectorization using models from Azure AI Studio**](vector-search-integrated-vectorization-ai-studio.md).|
3333
| [AzureOpenAIEmbedding skill (preview) supports more models on Azure OpenAI](cognitive-search-skill-azure-openai-embedding.md) | Skill | Updates to this skill add support for more embedding models on Azure OpenAI. New `dimensions` and `modelName` properties are used for specifying models. Previously, the dimensions limits were fixed at 1,536 dimensions. It's now configurable. This update is available through the Azure portal and the [2024-05-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-05-01-preview&preserve-view=true).|
3434
| [2024-05-01-preview Search REST API](/rest/api/searchservice/search-service-api-versions#2024-05-01-preview) | API | New preview version of the Search REST APIs provides new skills and vectorizers, new binary data type, OneLake files indexer, and new query parameters for more relevant results. See [Upgrade REST APIs](search-api-migration.md) if you have existing code written against the 2023-07-01-preview and need to migrate to this version.|
3535
| Azure SDK beta packages for new features | API | Review the changelogs of the following Azure SDK beta packages for new feature support: [Azure SDK for Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md), [Azure SDK for .NET](https://github.com/Azure/azure-sdk-for-net/blob/Azure.Search.Documents_11.6.0-beta.4/sdk/search/Azure.Search.Documents/CHANGELOG.md), [Azure SDK for Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/search/azure-search-documents/CHANGELOG.md) |
36+
| [Python code samples](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/readme.md) | Samples | New end-to-end samples demonstrate [integration with Cohere Embed v3](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/community-integration/cohere/azure-search-cohere-embed-v3-sample.ipynb), [integration with OneLake and cloud data platforms on Google and AWS](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo.ipynb), and [integration with Azure AI Vision multimodal APIs](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/embeddings/multimodal-embeddings/multimodal-embeddings.ipynb). |
3637
<!-- | Network security perimeter support (preview) | Feature | A network security perimeter is a new service that provides a secure perimeter for communication, and controlled access to resources outside of the perimeter. Azure AI Search is one of the eight Azure services that can run within a network security perimeter. This feature is provided by the [2024-03-01-preview Management REST API](/rest/api/searchmanagement/operation-groups?view=rest-searchmanagement-2024-03-01-preview&preserve-view=true) and the Azure portal. | -->
3738

3839
## April 2024

0 commit comments

Comments
 (0)