Skip to content

Commit 5c38d09

Browse files
authored
Update multimodal-search-overview.md
Updated with a few clarifications about support for table extraction and clarifying location metadata support.
1 parent c299362 commit 5c38d09

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

articles/search/multimodal-search-overview.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ titleSuffix: Azure AI Search
44
description: Learn what multimodal search is, how Azure AI Search supports it for text + image content, and where to find detailed concepts, tutorials, and samples.
55
ms.service: azure-ai-search
66
ms.topic: conceptual
7-
ms.date: 05/12/2025
7+
ms.date: 05/19/2025
88
author: gmndrg
99
ms.author: gimondra
1010
---
@@ -27,7 +27,7 @@ Azure AI Search simplifies the construction of a multimodal pipeline through a g
2727

2828
The functionality behind the **Import and vectorize data** wizard's multimodality option is powered by managed, configurable AI skills and the Azure Search knowledge store:
2929

30-
+ [Document Intelligence layout skill](cognitive-search-skill-document-intelligence-layout.md) and [document extraction skill](cognitive-search-skill-document-extraction.md) obtain page text, inline images, and structural metadata. The Document Extraction skill doesn't support polygon extraction or page number extraction. Also, the range of supported file types may vary. To ensure optimal alignment with your specific use case, check each skill documentation for detailed information on compatibility and capabilities.
30+
+ [Document Intelligence layout skill](cognitive-search-skill-document-intelligence-layout.md) and [document extraction skill](cognitive-search-skill-document-extraction.md) obtain page text, inline images, and structural metadata. The Document Extraction skill doesn't support polygon extraction or page number extraction. Also, the range of supported file types may vary. To ensure optimal alignment with your specific use case, check each skill documentation for detailed information on compatibility and capabilities. The native document parsing mechanisms (document layout or document extraction skills) don't have support for table recognition or its structure preservation. If table extraction and its structure preservation support is required, it's recommended that a [Web API custom skill](cognitive-search-custom-skill-web-api.md) is built and call [Azure AI Content Understanding service](/azure/ai-services/content-understanding/tutorial/build-rag-solution) for content extraction (including tables).
3131
+ [Split skill](cognitive-search-skill-textsplit.md) chunks the extracted text for utilization in the remaining pipeline functionality (such as embedding skills).
3232
+ [Gen AI prompt skill](cognitive-search-skill-genai-prompt.md) verbalizes images, producing concise natural-language descriptions suitable for text search and embedding using a Large Language Model (LLM).
3333
+ Text/image (or multimodal) embedding skills create embeddings for text and images, enabling similarity and hybrid retrieval. You can call [Azure OpenAI](cognitive-search-skill-azure-openai-embedding.md), [AI Foundry](cognitive-search-aml-skill.md), or [AI Vision](cognitive-search-skill-vision-vectorize.md) embedding models natively.
@@ -39,7 +39,8 @@ A multimodal pipeline begins by cracking each source document into chunks of tex
3939

4040
| Characteristic | Document Intelligence layout skill | Document extraction skill |
4141
|----------------|------------------------------------|---------------------------|
42-
| Location metadata extraction (page, bounding polygon) | Yes | No |
42+
| Text location metadata extraction (page, bounding polygon) | Yes | No |
43+
| Image location metadata extraction (page, bounding polygon) | Yes | Yes |
4344
| Data-extraction billing | Billed according to [Document Intelligence layout-model pricing](https://azure.microsoft.com/pricing/details/ai-document-intelligence/). | Image extraction is billed as outlined in the [Azure AI Search pricing page](https://azure.microsoft.com/pricing/details/search/). |
4445
| Recommended scenarios | RAG pipelines and agent workflows that need precise page numbers, on-page highlights, or diagram overlays in client apps. | Rapid prototyping or production pipelines where the exact position or detailed layout information isn't required. |
4546

0 commit comments

Comments
 (0)