Skip to content

Commit 3ec278d

Browse files
Merge pull request #266788 from PatrickFarley/comvis-updates
Vectorization new model updates
2 parents c2cc9ad + e650e17 commit 3ec278d

File tree

9 files changed

+166
-44
lines changed

9 files changed

+166
-44
lines changed

articles/ai-services/computer-vision/concept-image-retrieval.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Multi-modal embeddings concepts - Image Analysis 4.0
2+
title: Multimodal embeddings concepts - Image Analysis 4.0
33
titleSuffix: Azure AI services
44
description: Concepts related to image vectorization using the Image Analysis 4.0 API.
55
#services: cognitive-services
@@ -8,13 +8,13 @@ manager: nitinme
88

99
ms.service: azure-ai-vision
1010
ms.topic: conceptual
11-
ms.date: 01/19/2024
11+
ms.date: 02/20/2024
1212
ms.author: pafarley
1313
---
1414

15-
# Multi-modal embeddings (version 4.0 preview)
15+
# Multimodal embeddings (version 4.0)
1616

17-
Multi-modal embedding is the process of generating a numerical representation of an image that captures its features and characteristics in a vector format. These vectors encode the content and context of an image in a way that is compatible with text search over the same vector space.
17+
Multimodal embedding is the process of generating a numerical representation of an image that captures its features and characteristics in a vector format. These vectors encode the content and context of an image in a way that is compatible with text search over the same vector space.
1818

1919
Image retrieval systems have traditionally used features extracted from the images, such as content labels, tags, and image descriptors, to compare images and rank them by similarity. However, vector similarity search is gaining more popularity due to a number of benefits over traditional keyword-based search and is becoming a vital component in popular content search services.
2020

@@ -26,35 +26,35 @@ Vector search searches large collections of vectors in high-dimensional space to
2626

2727
## Business applications
2828

29-
Multi-modal embedding has a variety of applications in different fields, including:
29+
Multimodal embedding has a variety of applications in different fields, including:
3030

31-
- **Digital asset management**: Multi-modal embedding can be used to manage large collections of digital images, such as in museums, archives, or online galleries. Users can search for images based on visual features and retrieve the images that match their criteria.
31+
- **Digital asset management**: Multimodal embedding can be used to manage large collections of digital images, such as in museums, archives, or online galleries. Users can search for images based on visual features and retrieve the images that match their criteria.
3232
- **Security and surveillance**: Vectorization can be used in security and surveillance systems to search for images based on specific features or patterns, such as in, people & object tracking, or threat detection.
3333
- **Forensic image retrieval**: Vectorization can be used in forensic investigations to search for images based on their visual content or metadata, such as in cases of cyber-crime.
3434
- **E-commerce**: Vectorization can be used in online shopping applications to search for similar products based on their features or descriptions or provide recommendations based on previous purchases.
3535
- **Fashion and design**: Vectorization can be used in fashion and design to search for images based on their visual features, such as color, pattern, or texture. This can help designers or retailers to identify similar products or trends.
3636

3737
> [!CAUTION]
38-
> Multi-modal embedding is not designed analyze medical images for diagnostic features or disease patterns. Please do not use Multi-modal embedding for medical purposes.
38+
> Multimodal embedding is not designed analyze medical images for diagnostic features or disease patterns. Please do not use Multimodal embedding for medical purposes.
3939
4040
## What are vector embeddings?
4141

4242
Vector embeddings are a way of representing content—text or images—as vectors of real numbers in a high-dimensional space. Vector embeddings are often learned from large amounts of textual and visual data using machine learning algorithms, such as neural networks.
4343

4444
Each dimension of the vector corresponds to a different feature or attribute of the content, such as its semantic meaning, syntactic role, or context in which it commonly appears. In Azure AI Vision, image and text vector embeddings have 1024 dimensions.
4545

46-
> [!NOTE]
47-
> Vector embeddings can only be meaningfully compared if they are from the same model type.
46+
> [!IMPORTANT]
47+
> Vector embeddings can only be compared and matched if they're from the same model type. Images vectorized by one model won't be searchable through a different model. The latest Image Analysis API offers two models, version `2023-04-15` which supports text search in many languages, and the legacy `2022-04-11` model which supports only English.
4848
4949
## How does it work?
5050

51-
The following are the main steps of the image retrieval process using Multi-modal embeddings.
51+
The following are the main steps of the image retrieval process using Multimodal embeddings.
5252

5353
:::image type="content" source="media/image-retrieval.png" alt-text="Diagram of image retrieval process.":::
5454

55-
1. Vectorize Images and Text: the Multi-modal embeddings APIs, **VectorizeImage** and **VectorizeText**, can be used to extract feature vectors out of an image or text respectively. The APIs return a single feature vector representing the entire input.
55+
1. Vectorize Images and Text: the Multimodal embeddings APIs, **VectorizeImage** and **VectorizeText**, can be used to extract feature vectors out of an image or text respectively. The APIs return a single feature vector representing the entire input.
5656
> [!NOTE]
57-
> Multi-modal embedding does not do any biometric processing of human faces. For face detection and identification, see the [Azure AI Face service](./overview-identity.md).
57+
> Multimodal embedding does not do any biometric processing of human faces. For face detection and identification, see the [Azure AI Face service](./overview-identity.md).
5858
5959
1. Measure similarity: Vector search systems typically use distance metrics, such as cosine distance or Euclidean distance, to compare vectors and rank them by similarity. The [Vision studio](https://portal.vision.cognitive.azure.com/) demo uses [cosine distance](./how-to/image-retrieval.md#calculate-vector-similarity) to measure similarity.
6060
1. Retrieve Images: Use the top _N_ vectors similar to the search query and retrieve images corresponding to those vectors from your photo library to provide as the final result.
@@ -79,6 +79,6 @@ The image and video retrieval services return a field called "relevance." The te
7979

8080
## Next steps
8181

82-
Enable Multi-modal embeddings for your search service and follow the steps to generate vector embeddings for text and images.
83-
* [Call the Multi-modal embeddings APIs](./how-to/image-retrieval.md)
82+
Enable Multimodal embeddings for your search service and follow the steps to generate vector embeddings for text and images.
83+
* [Call the Multimodal embeddings APIs](./how-to/image-retrieval.md)
8484

articles/ai-services/computer-vision/how-to/image-retrieval.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Do image retrieval using multi-modal embeddings - Image Analysis 4.0
2+
title: Do image retrieval using multimodal embeddings - Image Analysis 4.0
33
titleSuffix: Azure AI services
44
description: Learn how to call the image retrieval API to vectorize image and search terms.
55
#services: cognitive-services
@@ -8,14 +8,14 @@ manager: nitinme
88

99
ms.service: azure-ai-vision
1010
ms.topic: how-to
11-
ms.date: 01/30/2024
11+
ms.date: 02/20/2024
1212
ms.author: pafarley
1313
ms.custom: references_regions
1414
---
1515

16-
# Do image retrieval using multi-modal embeddings (version 4.0 preview)
16+
# Do image retrieval using multimodal embeddings (version 4.0)
1717

18-
The Multi-modal embeddings APIs enable the _vectorization_ of images and text queries. They convert images to coordinates in a multi-dimensional vector space. Then, incoming text queries can also be converted to vectors, and images can be matched to the text based on semantic closeness. This allows the user to search a set of images using text, without the need to use image tags or other metadata. Semantic closeness often produces better results in search.
18+
The Multimodal embeddings APIs enable the _vectorization_ of images and text queries. They convert images to coordinates in a multi-dimensional vector space. Then, incoming text queries can also be converted to vectors, and images can be matched to the text based on semantic closeness. This allows the user to search a set of images using text, without the need to use image tags or other metadata. Semantic closeness often produces better results in search.
1919

2020
> [!IMPORTANT]
2121
> These APIs are only available in the following geographic regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
@@ -26,9 +26,9 @@ The Multi-modal embeddings APIs enable the _vectorization_ of images and text qu
2626
* Once you have your Azure subscription, <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision" title="Create a Computer Vision resource" target="_blank">create a Computer Vision resource </a> in the Azure portal to get your key and endpoint. Be sure to create it in one of the permitted geographic regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
2727
* After it deploys, select **Go to resource**. Copy the key and endpoint to a temporary location to use later on.
2828

29-
## Try out Multi-modal embeddings
29+
## Try out Multimodal embeddings
3030

31-
You can try out the Multi-modal embeddings feature quickly and easily in your browser using Vision Studio.
31+
You can try out the Multimodal embeddings feature quickly and easily in your browser using Vision Studio.
3232

3333
> [!IMPORTANT]
3434
> The Vision Studio experience is limited to 500 images. To use a larger image set, create your own search application using the APIs in this guide.
@@ -43,9 +43,10 @@ The `retrieval:vectorizeImage` API lets you convert an image's data to a vector.
4343
1. Replace `<endpoint>` with your Azure AI Vision endpoint.
4444
1. Replace `<subscription-key>` with your Azure AI Vision key.
4545
1. In the request body, set `"url"` to the URL of a remote image you want to use.
46+
1. Optionally, change the `model-version` parameter to an older version. `2022-04-11` is the legacy model that supports only English text. Images and text that are vectorized with a certain model aren't compatible with other models, so be sure to use the same model for both.
4647

4748
```bash
48-
curl.exe -v -X POST "https://<endpoint>/computervision/retrieval:vectorizeImage?api-version=2023-02-01-preview&modelVersion=latest" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription-key>" --data-ascii "
49+
curl.exe -v -X POST "https://<endpoint>/computervision/retrieval:vectorizeImage?api-version=2024-02-01-preview&model-version=2023-04-15" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription-key>" --data-ascii "
4950
{
5051
'url':'https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png'
5152
}"
@@ -69,9 +70,10 @@ The `retrieval:vectorizeText` API lets you convert a text string to a vector. To
6970
1. Replace `<endpoint>` with your Azure AI Vision endpoint.
7071
1. Replace `<subscription-key>` with your Azure AI Vision key.
7172
1. In the request body, set `"text"` to the example search term you want to use.
73+
1. Optionally, change the `model-version` parameter to an older version. `2022-04-11` is the legacy model that supports only English text. Images and text that are vectorized with a certain model aren't compatible with other models, so be sure to use the same model for both.
7274

7375
```bash
74-
curl.exe -v -X POST "https://<endpoint>/computervision/retrieval:vectorizeText?api-version=2023-02-01-preview&modelVersion=latest" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription-key>" --data-ascii "
76+
curl.exe -v -X POST "https://<endpoint>/computervision/retrieval:vectorizeText?api-version=2023-02-01-preview&model-version=2023-04-15" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription-key>" --data-ascii "
7577
{
7678
'text':'cat jumping'
7779
}"

articles/ai-services/computer-vision/how-to/video-retrieval.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,4 +338,4 @@ Connection: close
338338

339339
## Next steps
340340

341-
[Multi-modal embeddings concepts](../concept-image-retrieval.md)
341+
[Multimodal embeddings concepts](../concept-image-retrieval.md)

articles/ai-services/computer-vision/language-support.md

Lines changed: 110 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ The following table lists the OCR supported languages for print text by the most
127127
|Kazakh (Latin) | `kk-latn`|Zhuang | `za` |
128128
|Khaling | `klr`|Zulu | `zu` |
129129

130-
## Image analysis
130+
## Analyze image
131131

132132
Some features of the [Analyze - Image](https://westcentralus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-1-ga/operations/56f91f2e778daf14a499f21b) API can return results in other languages, specified with the `language` query parameter. Other actions return results in English regardless of what language is specified, and others throw an exception for unsupported languages. Actions are specified with the `visualFeatures` and `details` query parameters; see the [Overview](overview-image-analysis.md) for a list of all the actions you can do with image analysis. Languages for tagging are only available in API version 3.2 or later.
133133

@@ -185,3 +185,112 @@ Some features of the [Analyze - Image](https://westcentralus.dev.cognitive.micro
185185
|Chinese Simplified |`zh`||||||||| ||||
186186
|Chinese Simplified |`zh-Hans`| || |||||| ||||
187187
|Chinese Traditional |`zh-Hant`| || |||||| ||||
188+
189+
## Multimodal embeddings
190+
191+
The latest [Multimodal embeddings](./concept-image-retrieval.md) model supports vector search in many languages. The original model supports English only. Images that are vectorized in the English-only model are not compatible with text searches in the multi-lingual model.
192+
193+
| Language | Language code | `2023-04-15` model | `2022-04-11` model|
194+
|-----------------------|---------------| -- |-- |
195+
| Akrikaans | `af` || |
196+
| Amharic | `am` || |
197+
| Arabic | `ar` || |
198+
| Armenian | `hy` || |
199+
| Assamese | `as` || |
200+
| Asturian | `ast` || |
201+
| Azerbaijani | `az` || |
202+
| Belarusian | `be` || |
203+
| Bengali | `bn` || |
204+
| Bosnian | `bs` || |
205+
| Bulgarian | `bg` || |
206+
| Burmese | `my` || |
207+
| Catalan | `ca` || |
208+
| Cebuano | `ceb` || |
209+
| Chinese Simpl | `zho` || |
210+
| Chinese Trad | `zho` || |
211+
| Croatian | `hr` || |
212+
| Czech | `cs` || |
213+
| Danish | `da` || |
214+
| Dutch | `nl` || |
215+
| English | `en` |||
216+
| Estonian | `et` || |
217+
| Filipino (Tagalog) | `tl` || |
218+
| Finnish | `fi` || |
219+
| French | `fr` || |
220+
| Fulah | `ff` || |
221+
| Galician | `gl` || |
222+
| Ganda | `lg` || |
223+
| Georgian | `ka` || |
224+
| German | `de` || |
225+
| Greek | `el` || |
226+
| Gujarati | `gu` || |
227+
| Hausa | `ha` || |
228+
| Hebrew | `he` || |
229+
| Hindi | `hi` || |
230+
| Hungarian | `hu` || |
231+
| Icelandic | `is` || |
232+
| Igbo | `ig` || |
233+
| Indonesian | `id` || |
234+
| Irish | `ga` || |
235+
| Italian | `it` || |
236+
| Japanese | `ja` || |
237+
| Javanese | `jv` || |
238+
| Kabuverdianu | `kea` || |
239+
| Kamba | `kam` || |
240+
| Kannada | `kn` || |
241+
| Kazakh | `kk` || |
242+
| Khmer | `km` || |
243+
| Korean | `ko` || |
244+
| Kyrgyz | `ky` || |
245+
| Lao | `lo` || |
246+
| Latvian | `lv` || |
247+
| Lingala | `ln` || |
248+
| Lithuanian | `lt` || |
249+
| Luo | `luo` || |
250+
| Luxembourgish | `lb` || |
251+
| Macedonian | `mk` || |
252+
| Malay | `ms` || |
253+
| Malayalam | `ml` || |
254+
| Maltese | `mt` || |
255+
| Maori | `mi` || |
256+
| Marathi | `mr` || |
257+
| Mongolian | `mn` || |
258+
| Nepali | `ne` || |
259+
| Northern Sotho | `ns` || |
260+
| Norwegian | `no` || |
261+
| Nyanja | `ny` || |
262+
| Occitan | `oc` || |
263+
| Oriya | `or` || |
264+
| Oromo | `om` || |
265+
| Pashto | `ps` || |
266+
| Persian | `fa` || |
267+
| Polish | `pl` || |
268+
| Portuguese (Brazil) | `pt` || |
269+
| Punjabi | `pa` || |
270+
| Romanian | `ro` || |
271+
| Russian | `ru` || |
272+
| Serbian | `sr` || |
273+
| Shona | `sn` || |
274+
| Sindhi | `sd` || |
275+
| Slovak | `sk` || |
276+
| Slovenian | `sl` || |
277+
| Somali | `so` || |
278+
| Sorani Kurdish | `ku` || |
279+
| Spanish (Latin American) | `es` || |
280+
| Swahili | `sw` || |
281+
| Swedish | `sv` || |
282+
| Tajik | `tg` || |
283+
| Tamil | `ta` || |
284+
| Telugu | `te` || |
285+
| Thai | `th` || |
286+
| Turkish | `tr` || |
287+
| Ukrainian | `uk` || |
288+
| Umbundu | `umb` || |
289+
| Urdu | `ur` || |
290+
| Uzbek | `uz` || |
291+
| Vietnamese | `vi` || |
292+
| Welsh | `cy` || |
293+
| Wolof | `wo` || |
294+
| Xhosa | `xh` || |
295+
| Yoruba | `yo` || |
296+
| Zulu | `zu` || |

0 commit comments

Comments
 (0)