You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Concepts related to image vectorization using the Image Analysis 4.0 API.
5
5
#services: cognitive-services
@@ -8,13 +8,13 @@ manager: nitinme
8
8
9
9
ms.service: azure-ai-vision
10
10
ms.topic: conceptual
11
-
ms.date: 01/19/2024
11
+
ms.date: 02/20/2024
12
12
ms.author: pafarley
13
13
---
14
14
15
-
# Multi-modal embeddings (version 4.0 preview)
15
+
# Multimodal embeddings (version 4.0)
16
16
17
-
Multi-modal embedding is the process of generating a numerical representation of an image that captures its features and characteristics in a vector format. These vectors encode the content and context of an image in a way that is compatible with text search over the same vector space.
17
+
Multimodal embedding is the process of generating a numerical representation of an image that captures its features and characteristics in a vector format. These vectors encode the content and context of an image in a way that is compatible with text search over the same vector space.
18
18
19
19
Image retrieval systems have traditionally used features extracted from the images, such as content labels, tags, and image descriptors, to compare images and rank them by similarity. However, vector similarity search is gaining more popularity due to a number of benefits over traditional keyword-based search and is becoming a vital component in popular content search services.
20
20
@@ -26,35 +26,35 @@ Vector search searches large collections of vectors in high-dimensional space to
26
26
27
27
## Business applications
28
28
29
-
Multi-modal embedding has a variety of applications in different fields, including:
29
+
Multimodal embedding has a variety of applications in different fields, including:
30
30
31
-
-**Digital asset management**: Multi-modal embedding can be used to manage large collections of digital images, such as in museums, archives, or online galleries. Users can search for images based on visual features and retrieve the images that match their criteria.
31
+
-**Digital asset management**: Multimodal embedding can be used to manage large collections of digital images, such as in museums, archives, or online galleries. Users can search for images based on visual features and retrieve the images that match their criteria.
32
32
-**Security and surveillance**: Vectorization can be used in security and surveillance systems to search for images based on specific features or patterns, such as in, people & object tracking, or threat detection.
33
33
-**Forensic image retrieval**: Vectorization can be used in forensic investigations to search for images based on their visual content or metadata, such as in cases of cyber-crime.
34
34
-**E-commerce**: Vectorization can be used in online shopping applications to search for similar products based on their features or descriptions or provide recommendations based on previous purchases.
35
35
-**Fashion and design**: Vectorization can be used in fashion and design to search for images based on their visual features, such as color, pattern, or texture. This can help designers or retailers to identify similar products or trends.
36
36
37
37
> [!CAUTION]
38
-
> Multi-modal embedding is not designed analyze medical images for diagnostic features or disease patterns. Please do not use Multi-modal embedding for medical purposes.
38
+
> Multimodal embedding is not designed analyze medical images for diagnostic features or disease patterns. Please do not use Multimodal embedding for medical purposes.
39
39
40
40
## What are vector embeddings?
41
41
42
42
Vector embeddings are a way of representing content—text or images—as vectors of real numbers in a high-dimensional space. Vector embeddings are often learned from large amounts of textual and visual data using machine learning algorithms, such as neural networks.
43
43
44
44
Each dimension of the vector corresponds to a different feature or attribute of the content, such as its semantic meaning, syntactic role, or context in which it commonly appears. In Azure AI Vision, image and text vector embeddings have 1024 dimensions.
45
45
46
-
> [!NOTE]
47
-
> Vector embeddings can only be meaningfully compared if they are from the same model type.
46
+
> [!IMPORTANT]
47
+
> Vector embeddings can only be compared and matched if they're from the same model type. Images vectorized by one model won't be searchable through a different model. The latest Image Analysis API offers two models, version `2023-04-15` which supports text search in many languages, and the legacy `2022-04-11` model which supports only English.
48
48
49
49
## How does it work?
50
50
51
-
The following are the main steps of the image retrieval process using Multi-modal embeddings.
51
+
The following are the main steps of the image retrieval process using Multimodal embeddings.
52
52
53
53
:::image type="content" source="media/image-retrieval.png" alt-text="Diagram of image retrieval process.":::
54
54
55
-
1. Vectorize Images and Text: the Multi-modal embeddings APIs, **VectorizeImage** and **VectorizeText**, can be used to extract feature vectors out of an image or text respectively. The APIs return a single feature vector representing the entire input.
55
+
1. Vectorize Images and Text: the Multimodal embeddings APIs, **VectorizeImage** and **VectorizeText**, can be used to extract feature vectors out of an image or text respectively. The APIs return a single feature vector representing the entire input.
56
56
> [!NOTE]
57
-
> Multi-modal embedding does not do any biometric processing of human faces. For face detection and identification, see the [Azure AI Face service](./overview-identity.md).
57
+
> Multimodal embedding does not do any biometric processing of human faces. For face detection and identification, see the [Azure AI Face service](./overview-identity.md).
58
58
59
59
1. Measure similarity: Vector search systems typically use distance metrics, such as cosine distance or Euclidean distance, to compare vectors and rank them by similarity. The [Vision studio](https://portal.vision.cognitive.azure.com/) demo uses [cosine distance](./how-to/image-retrieval.md#calculate-vector-similarity) to measure similarity.
60
60
1. Retrieve Images: Use the top _N_ vectors similar to the search query and retrieve images corresponding to those vectors from your photo library to provide as the final result.
@@ -79,6 +79,6 @@ The image and video retrieval services return a field called "relevance." The te
79
79
80
80
## Next steps
81
81
82
-
Enable Multi-modal embeddings for your search service and follow the steps to generate vector embeddings for text and images.
83
-
*[Call the Multi-modal embeddings APIs](./how-to/image-retrieval.md)
82
+
Enable Multimodal embeddings for your search service and follow the steps to generate vector embeddings for text and images.
83
+
*[Call the Multimodal embeddings APIs](./how-to/image-retrieval.md)
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/how-to/image-retrieval.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Do image retrieval using multi-modal embeddings - Image Analysis 4.0
2
+
title: Do image retrieval using multimodal embeddings - Image Analysis 4.0
3
3
titleSuffix: Azure AI services
4
4
description: Learn how to call the image retrieval API to vectorize image and search terms.
5
5
#services: cognitive-services
@@ -8,14 +8,14 @@ manager: nitinme
8
8
9
9
ms.service: azure-ai-vision
10
10
ms.topic: how-to
11
-
ms.date: 01/30/2024
11
+
ms.date: 02/20/2024
12
12
ms.author: pafarley
13
13
ms.custom: references_regions
14
14
---
15
15
16
-
# Do image retrieval using multi-modal embeddings (version 4.0 preview)
16
+
# Do image retrieval using multimodal embeddings (version 4.0)
17
17
18
-
The Multi-modal embeddings APIs enable the _vectorization_ of images and text queries. They convert images to coordinates in a multi-dimensional vector space. Then, incoming text queries can also be converted to vectors, and images can be matched to the text based on semantic closeness. This allows the user to search a set of images using text, without the need to use image tags or other metadata. Semantic closeness often produces better results in search.
18
+
The Multimodal embeddings APIs enable the _vectorization_ of images and text queries. They convert images to coordinates in a multi-dimensional vector space. Then, incoming text queries can also be converted to vectors, and images can be matched to the text based on semantic closeness. This allows the user to search a set of images using text, without the need to use image tags or other metadata. Semantic closeness often produces better results in search.
19
19
20
20
> [!IMPORTANT]
21
21
> These APIs are only available in the following geographic regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
@@ -26,9 +26,9 @@ The Multi-modal embeddings APIs enable the _vectorization_ of images and text qu
26
26
* Once you have your Azure subscription, <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision"title="Create a Computer Vision resource"target="_blank">create a Computer Vision resource </a> in the Azure portal to get your key and endpoint. Be sure to create it in one of the permitted geographic regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
27
27
* After it deploys, select **Go to resource**. Copy the key and endpoint to a temporary location to use later on.
28
28
29
-
## Try out Multi-modal embeddings
29
+
## Try out Multimodal embeddings
30
30
31
-
You can try out the Multi-modal embeddings feature quickly and easily in your browser using Vision Studio.
31
+
You can try out the Multimodal embeddings feature quickly and easily in your browser using Vision Studio.
32
32
33
33
> [!IMPORTANT]
34
34
> The Vision Studio experience is limited to 500 images. To use a larger image set, create your own search application using the APIs in this guide.
@@ -43,9 +43,10 @@ The `retrieval:vectorizeImage` API lets you convert an image's data to a vector.
43
43
1. Replace `<endpoint>` with your Azure AI Vision endpoint.
44
44
1. Replace `<subscription-key>` with your Azure AI Vision key.
45
45
1. In the request body, set `"url"` to the URL of a remote image you want to use.
46
+
1. Optionally, change the `model-version` parameter to an older version. `2022-04-11` is the legacy model that supports only English text. Images and text that are vectorized with a certain model aren't compatible with other models, so be sure to use the same model for both.
@@ -69,9 +70,10 @@ The `retrieval:vectorizeText` API lets you convert a text string to a vector. To
69
70
1. Replace `<endpoint>` with your Azure AI Vision endpoint.
70
71
1. Replace `<subscription-key>` with your Azure AI Vision key.
71
72
1. In the request body, set `"text"` to the example search term you want to use.
73
+
1. Optionally, change the `model-version` parameter to an older version. `2022-04-11` is the legacy model that supports only English text. Images and text that are vectorized with a certain model aren't compatible with other models, so be sure to use the same model for both.
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/language-support.md
+110-1Lines changed: 110 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -127,7 +127,7 @@ The following table lists the OCR supported languages for print text by the most
127
127
|Kazakh (Latin) |`kk-latn`|Zhuang |`za`|
128
128
|Khaling |`klr`|Zulu |`zu`|
129
129
130
-
## Image analysis
130
+
## Analyze image
131
131
132
132
Some features of the [Analyze - Image](https://westcentralus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-1-ga/operations/56f91f2e778daf14a499f21b) API can return results in other languages, specified with the `language` query parameter. Other actions return results in English regardless of what language is specified, and others throw an exception for unsupported languages. Actions are specified with the `visualFeatures` and `details` query parameters; see the [Overview](overview-image-analysis.md) for a list of all the actions you can do with image analysis. Languages for tagging are only available in API version 3.2 or later.
133
133
@@ -185,3 +185,112 @@ Some features of the [Analyze - Image](https://westcentralus.dev.cognitive.micro
185
185
|Chinese Simplified |`zh`|✅ | ✅| ✅|||||||✅|✅||
186
186
|Chinese Simplified |`zh-Hans`|| ✅|||||||||||
187
187
|Chinese Traditional |`zh-Hant`|| ✅|||||||||||
188
+
189
+
## Multimodal embeddings
190
+
191
+
The latest [Multimodal embeddings](./concept-image-retrieval.md) model supports vector search in many languages. The original model supports English only. Images that are vectorized in the English-only model are not compatible with text searches in the multi-lingual model.
192
+
193
+
| Language | Language code |`2023-04-15` model |`2022-04-11` model|
0 commit comments