You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/tutorial-multimodal-index-embeddings-skill.md
+13-12Lines changed: 13 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
---
2
-
title: 'Tutorial: Index multimodal content using multimodal embedding and document layout skill'
2
+
title: 'Tutorial: Index multimodal content using multimodal embedding and Document Layout skill'
3
3
titleSuffix: Azure AI Search
4
4
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and Azure AI Vision for embeddings.
5
5
6
6
manager: arjagann
7
-
author: rawan
7
+
author: rawan
8
8
ms.author: rawan
9
9
ms.service: azure-ai-search
10
10
ms.custom:
11
11
ms.topic: tutorial
12
-
ms.date: 05/05/2025
12
+
ms.date: 05/28/2025
13
13
14
14
---
15
15
@@ -20,11 +20,11 @@ In this Azure AI Search tutorial, learn how to build a multimodal indexing pipel
20
20
21
21
In this tutorial, you use:
22
22
23
-
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
23
+
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
24
24
25
-
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its `locationMetadata` from various documents, such as page numbers or bounding regions.
25
+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
26
26
27
-
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
27
+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-multimodal-indexing-with-image-verbalization-and-doc-extraction.md).
28
28
29
29
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
30
30
@@ -47,13 +47,13 @@ Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you wi
47
47
48
48
+ An [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) for image vectorization. Image vectorization requires Azure AI Vision multimodal embeddings. For an updated list of regions, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
49
49
50
-
+[Azure AI Search](search-what-is-azure-search.md), with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher—this tutorial is not supported on the Free tier. Additionally, it must be in the [same region as Azure AI services multi-service](search-create-service-portal.md#regions-with-the-most-overlap).
50
+
+[Azure AI Search](search-what-is-azure-search.md), with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher—this tutorial isn't supported on the Free tier. Additionally, it must be in the [same region as Azure AI services multi-service](search-create-service-portal.md#regions-with-the-most-overlap).
51
51
52
52
+[Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
@@ -288,9 +288,9 @@ POST {{baseUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
288
288
289
289
Key points:
290
290
291
-
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions (e.g., 1024) and a vector search profile.
291
+
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions, such as 1024, and a vector search profile.
292
292
293
-
+`location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
293
+
+`location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
294
294
295
295
+ For more information on vector search, see [Vectors in Azure AI Search](vector-search-overview.md).
296
296
@@ -607,9 +607,10 @@ You can use the Azure portal to delete indexes, indexers, and data sources.
607
607
608
608
## See also
609
609
610
-
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out
610
+
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out:
Copy file name to clipboardExpand all lines: articles/search/tutorial-multimodal-index-image-verbalization-skill.md
+14-16Lines changed: 14 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: 'Tutorial: Index multimodal content using image verbalization and document layout skill'
2
+
title: 'Tutorial: Index multimodal content using image verbalization and Document Layout skill'
3
3
titleSuffix: Azure AI Search
4
4
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and GenAI Prompt skill for image verbalizations.
5
5
@@ -9,27 +9,25 @@ ms.author: rawan
9
9
ms.service: azure-ai-search
10
10
ms.custom:
11
11
ms.topic: tutorial
12
-
ms.date: 05/05/2025
12
+
ms.date: 05/28/2025
13
13
14
14
---
15
15
16
16
# Tutorial: Index mixed content using image verbalizations and the Document Layout skill
17
17
18
-
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that that chunks data based on document structure, and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
18
+
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
19
19
20
-
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities—text and verbalized images.
20
+
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities: text and verbalized images.
21
21
22
22
In this tutorial, you use:
23
23
24
-
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
24
+
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
25
25
26
-
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its `locationMetadata` from various documents, such as page numbers or bounding regions.
26
+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
27
27
28
-
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
28
+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-multimodal-indexing-with-image-verbalization-and-doc-extraction.md).
29
29
30
-
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions — text-based descriptions of visual content — for search and grounding.
31
-
32
-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
30
+
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions, which are text-based descriptions of visual content, for search and grounding.
33
31
34
32
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
35
33
@@ -51,13 +49,13 @@ Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you wi
+[Azure AI Search](search-what-is-azure-search.md). [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher—this tutorial is not supported on the Free tier. Additionally, ensure your service is deployed in a [supported region for AI Vision](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
52
+
+[Azure AI Search](search-what-is-azure-search.md). [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier.
55
53
56
54
+[Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
@@ -293,9 +291,9 @@ POST {{baseUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
293
291
294
292
Key points:
295
293
296
-
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions (e.g., 3072) and a vector search profile.
294
+
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions, such as 3072, and a vector search profile.
297
295
298
-
+`location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
296
+
+`location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
299
297
300
298
+ For more information on vector search, see [Vectors in Azure AI Search](vector-search-overview.md).
301
299
@@ -660,10 +658,10 @@ You can use the Azure portal to delete indexes, indexers, and data sources.
660
658
661
659
## See also
662
660
663
-
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out
661
+
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out:
0 commit comments