Skip to content

Commit 0b4e96c

Browse files
committed
Updated multimodal content
1 parent 3043676 commit 0b4e96c

6 files changed

+133
-97
lines changed
14.9 MB
Loading

articles/search/multimodal-search-overview.md

Lines changed: 81 additions & 45 deletions
Large diffs are not rendered by default.

articles/search/tutorial-multimodal-index-embeddings-skill.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
2-
title: 'Tutorial: Index multimodal content using multimodal embedding and document layout skill'
2+
title: 'Tutorial: Index multimodal content using multimodal embedding and Document Layout skill'
33
titleSuffix: Azure AI Search
44
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and Azure AI Vision for embeddings.
55

66
manager: arjagann
7-
author: rawan
7+
author: rawan
88
ms.author: rawan
99
ms.service: azure-ai-search
1010
ms.custom:
1111
ms.topic: tutorial
12-
ms.date: 05/05/2025
12+
ms.date: 05/28/2025
1313

1414
---
1515

@@ -20,11 +20,11 @@ In this Azure AI Search tutorial, learn how to build a multimodal indexing pipel
2020

2121
In this tutorial, you use:
2222

23-
+ A 36-page PDF document that combines rich visual contentsuch as charts, infographics, and scanned pageswith traditional text.
23+
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
2424

25-
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its `locationMetadata` from various documents, such as page numbers or bounding regions.
25+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
2626

27-
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
27+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-multimodal-indexing-with-image-verbalization-and-doc-extraction.md).
2828

2929
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
3030

@@ -47,13 +47,13 @@ Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you wi
4747

4848
+ An [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) for image vectorization. Image vectorization requires Azure AI Vision multimodal embeddings. For an updated list of regions, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
4949

50-
+ [Azure AI Search](search-what-is-azure-search.md), with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher—this tutorial is not supported on the Free tier. Additionally, it must be in the [same region as Azure AI services multi-service](search-create-service-portal.md#regions-with-the-most-overlap).
50+
+ [Azure AI Search](search-what-is-azure-search.md), with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher—this tutorial isn't supported on the Free tier. Additionally, it must be in the [same region as Azure AI services multi-service](search-create-service-portal.md#regions-with-the-most-overlap).
5151

5252
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
5353

5454
### Download files
5555

56-
Download the sample PDF below:
56+
Download the following sample PDF:
5757

5858
+ [sustainable-ai-pdf](https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/msc/documents/presentations/CSR/Accelerating-Sustainability-with-AI-2025.pdf)
5959

@@ -288,9 +288,9 @@ POST {{baseUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
288288

289289
Key points:
290290

291-
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions (e.g., 1024) and a vector search profile.
291+
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions, such as 1024, and a vector search profile.
292292

293-
+ `location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
293+
+ `location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
294294

295295
+ For more information on vector search, see [Vectors in Azure AI Search](vector-search-overview.md).
296296

@@ -607,9 +607,10 @@ You can use the Azure portal to delete indexes, indexers, and data sources.
607607

608608
## See also
609609

610-
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out
610+
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out:
611+
611612
+ [AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md)
612613
+ [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)
613614
+ [Vectors in Azure AI Search](vector-search-overview.md)
614615
+ [Semantic ranking in Azure AI Search](semantic-search-overview.md)
615-
+ [Index multimodal content using embedding and document extraction skill](https://aka.ms/azs-multimodal)
616+
+ [Index multimodal content using embeddings and Document Extraction skill](tutorial-multimodal-indexing-with-embedding-and-doc-extraction.md)

articles/search/tutorial-multimodal-index-image-verbalization-skill.md

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Tutorial: Index multimodal content using image verbalization and document layout skill'
2+
title: 'Tutorial: Index multimodal content using image verbalization and Document Layout skill'
33
titleSuffix: Azure AI Search
44
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and GenAI Prompt skill for image verbalizations.
55

@@ -9,27 +9,25 @@ ms.author: rawan
99
ms.service: azure-ai-search
1010
ms.custom:
1111
ms.topic: tutorial
12-
ms.date: 05/05/2025
12+
ms.date: 05/28/2025
1313

1414
---
1515

1616
# Tutorial: Index mixed content using image verbalizations and the Document Layout skill
1717

18-
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that that chunks data based on document structure, and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
18+
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
1919

20-
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalitiestext and verbalized images.
20+
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities: text and verbalized images.
2121

2222
In this tutorial, you use:
2323

24-
+ A 36-page PDF document that combines rich visual contentsuch as charts, infographics, and scanned pageswith traditional text.
24+
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
2525

26-
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its `locationMetadata` from various documents, such as page numbers or bounding regions.
26+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
2727

28-
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
28+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-multimodal-indexing-with-image-verbalization-and-doc-extraction.md).
2929

30-
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions — text-based descriptions of visual content — for search and grounding.
31-
32-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
30+
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions, which are text-based descriptions of visual content, for search and grounding.
3331

3432
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
3533

@@ -51,13 +49,13 @@ Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you wi
5149

5250
+ [Azure Storage](/azure/storage/common/storage-account-create).
5351

54-
+ [Azure AI Search](search-what-is-azure-search.md). [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher—this tutorial is not supported on the Free tier. Additionally, ensure your service is deployed in a [supported region for AI Vision](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
52+
+ [Azure AI Search](search-what-is-azure-search.md). [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier.
5553

5654
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
5755

5856
### Download files
5957

60-
Download the sample PDF below:
58+
Download the following sample PDF:
6159

6260
+ [sustainable-ai-pdf](https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/msc/documents/presentations/CSR/Accelerating-Sustainability-with-AI-2025.pdf)
6361

@@ -293,9 +291,9 @@ POST {{baseUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
293291

294292
Key points:
295293

296-
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions (e.g., 3072) and a vector search profile.
294+
+ Text and image embeddings are stored in the `content_embedding` field and must be configured with appropriate dimensions, such as 3072, and a vector search profile.
297295

298-
+ `location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
296+
+ `location_metadata` captures bounding polygon and page number metadata for each text chunk and normalized image, enabling precise spatial search or UI overlays.
299297

300298
+ For more information on vector search, see [Vectors in Azure AI Search](vector-search-overview.md).
301299

@@ -660,10 +658,10 @@ You can use the Azure portal to delete indexes, indexers, and data sources.
660658

661659
## See also
662660

663-
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out
661+
Now that you're familiar with a sample implementation of a multimodal indexing scenario, check out:
662+
664663
+ [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md)
665664
+ [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)
666665
+ [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md)
667666
+ [Vectors in Azure AI Search](vector-search-overview.md)
668667
+ [Semantic ranking in Azure AI Search](semantic-search-overview.md)
669-
+ [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and document layout skill](https://aka.ms/azs-multimodal)

0 commit comments

Comments
 (0)