Skip to content

Commit c165085

Browse files
committed
Differentiate multimodal tutorials
1 parent fcbbcc0 commit c165085

10 files changed

+37
-48
lines changed

articles/search/cognitive-search-skill-document-extraction.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ The **Document Extraction** skill extracts content from a file within the enrich
1919

2020
For [vector](vector-search-overview.md) and [multimodal search](multimodal-search-overview.md), Document Extraction combined with the [Text Split skill](cognitive-search-skill-textsplit.md) is more affordable than other [data chunking approaches](vector-search-how-to-chunk-documents.md). The following tutorials demonstrate skill usage for different scenarios:
2121

22-
+ [Tutorial: Index mixed content using multimodal embeddings and the Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md)
22+
+ [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md)
2323

24-
+ [Tutorial: Index mixed content using image verbalizations and the Document Extraction skill](tutorial-document-extraction-image-verbalization.md)
24+
+ [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md)
2525

2626
> [!NOTE]
2727
> This skill isn't bound to Azure AI services and has no Azure AI services key requirement.

articles/search/cognitive-search-skill-document-intelligence-layout.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,15 @@ The **Document Layout** skill analyzes a document to detect structure and charac
2222

2323
This article is the reference documentation for the Document Layout skill. For usage information, see [How to chunk and vectorize by document layout](search-how-to-semantic-chunking.md).
2424

25-
It's common to use this skill on content such as PDFs that have structure and images. The following tutorials demonstrate several scenarios:
25+
This skill uses the [Document Intelligence layout model](/azure/ai-services/document-intelligence/concept-layout) provided in [Azure AI Document Intelligence](/azure/ai-services/document-intelligence/overview).
2626

27-
+ [Tutorial: Index mixed content using image verbalizations and the Document Layout skill](tutorial-document-layout-image-verbalization.md)
27+
This skill is bound to a [billable Azure AI multi-service resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services Standard price](https://azure.microsoft.com/pricing/details/cognitive-services/).
2828

29-
+ [Tutorial: Index mixed content using multimodal embeddings and the Document Layout skill](tutorial-document-layout-multimodal-embeddings.md)
30-
31-
> [!NOTE]
32-
> This skill uses the [Document Intelligence layout model](/azure/ai-services/document-intelligence/concept-layout) provided in [Azure AI Document Intelligence](/azure/ai-services/document-intelligence/overview).
29+
> [!TIP]
30+
> It's common to use this skill on content such as PDFs that have structure and images. The following tutorials demonstrate image verbalization with two different data chunking techniques:
3331
>
34-
> This skill is bound to a [billable Azure AI multi-service resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services Standard price](https://azure.microsoft.com/pricing/details/cognitive-services/).
32+
> - [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
33+
> - [Tutorial: Vectorize from a structured document layout](tutorial-document-layout-multimodal-embeddings.md)
3534
>
3635
3736
## Limitations

articles/search/cognitive-search-skill-genai-prompt.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.date: 07/28/2025
1717

1818
The **GenAI (Generative AI) Prompt** skill executes a *chat completion* request against a Large Language Model (LLM) deployed in Azure AI Foundry or Azure OpenAI in Azure AI Foundry Models. Use this capability to create new information that can be indexed and stored as searchable content.
1919

20-
Here are some examples of content generation:
20+
Here are some examples of how the GenAI prompt skill can help you create content:
2121

2222
- verbalize images
2323
- summarize large passages of text
@@ -27,10 +27,10 @@ Here are some examples of content generation:
2727
The GenAI Prompt skill is available in the [2025-05-01-preview REST API](/rest/api/searchservice/skillsets/create?view=rest-searchservice-2025-05-01-preview&preserve-view=true) only. The skill supports text, image, and multimodal content such as a PDF that contains text and images.
2828

2929
> [!TIP]
30-
> It's common to use this skill combined with a data chunking skill. The following tutorials demonstrate the image verbalization scenarios with two different data chunking techniques:
30+
> It's common to use this skill combined with a data chunking skill. The following tutorials demonstrate image verbalization with two different data chunking techniques:
3131
>
32-
> - [Tutorial: Index mixed content using image verbalizations and the Document Layout skill](tutorial-document-layout-image-verbalization.md)
33-
> - [Tutorial: Index mixed content using image verbalizations and the Document Extraction skill](tutorial-document-extraction-image-verbalization.md)
32+
> - [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md)
33+
> - [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
3434
>
3535
3636
## Supported models

articles/search/multimodal-search-overview.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,8 @@ To help you get started with multimodal search in Azure AI Search, here's a coll
116116
| Content | Description |
117117
|--|--|
118118
| [Quickstart: Multimodal search in the Azure portal](search-get-started-portal-image-search.md) | Create and test a multimodal index in the Azure portal using the wizard and Search Explorer. |
119-
| [Tutorial: Image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md) | Extract text and images, verbalize diagrams, and embed the resulting descriptions and text into a searchable index. |
120-
| [Tutorial: Multimodal embeddings and Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md) | Use a vision-text model to embed both text and images directly, enabling visual-similarity search over scanned PDFs. |
121-
| [Tutorial: Image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md) | Apply layout-aware chunking and diagram verbalization, capture location metadata, and store cropped images for precise citations and page highlights. |
122-
| [Tutorial: Multimodal embeddings and Document Layout skill](tutorial-document-layout-multimodal-embeddings.md) | Combine layout-aware chunking with unified embeddings for hybrid semantic and keyword search that returns exact hit locations. |
119+
| [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md) | Extract text and images, verbalize diagrams, and embed the resulting descriptions and text into a searchable index. |
120+
| [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md) | Use a vision-text model to embed both text and images directly, enabling visual-similarity search over scanned PDFs. |
121+
| [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md) | Apply layout-aware chunking and diagram verbalization, capture location metadata, and store cropped images for precise citations and page highlights. |
122+
| [Tutorial: Vectorize from a structured document layout](tutorial-document-layout-multimodal-embeddings.md) | Combine layout-aware chunking with unified embeddings for hybrid semantic and keyword search that returns exact hit locations. |
123123
| [Sample app: Multimodal RAG GitHub repository](https://aka.ms/azs-multimodal-sample-app-repo) | An end-to-end, code-ready RAG application with multimodal capabilities that surfaces both text snippets and image annotations. Ideal for jump-starting enterprise copilots. |

articles/search/search-get-started-portal-image-search.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -465,7 +465,7 @@ This quickstart uses billable Azure resources. If you no longer need the resourc
465465

466466
This quickstart introduced you to the **Import and vectorize data** wizard, which creates all of the necessary objects for multimodal search. To explore each step in detail, see the following tutorials:
467467

468-
+ [Tutorial: Image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md)
469-
+ [Tutorial: Image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md)
470-
+ [Tutorial: Multimodal embeddings and Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md)
471-
+ [Tutorial: Multimodal embeddings and Document Layout skill](tutorial-document-layout-multimodal-embeddings.md)
468+
+ [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md)
469+
+ [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
470+
+ [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md)
471+
+ [Tutorial: Vectorize from a structured document layout](tutorial-document-layout-multimodal-embeddings.md)

articles/search/toc.yml

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -138,24 +138,14 @@ items:
138138
href: tutorial-adls-gen2-indexer-acls.md
139139
- name: Multimodal indexing tutorials
140140
items:
141-
- name: Vectorize from any document
141+
- name: Vectorize images and text
142142
href: tutorial-document-extraction-multimodal-embeddings.md
143-
- name: Vectorize from a structured document
144-
href: tutorial-document-layout-multimodal-embeddings.md
145-
- name: Verbalize images from any document
143+
- name: Verbalize images using generative AI
146144
href: tutorial-document-extraction-image-verbalization.md
147-
- name: Verbalize images from a structured document
148-
href: tutorial-document-layout-image-verbalization.md
149-
- name: Multimodal indexing tutorials
150-
items:
151-
- name: Vectorize images and text
152-
href:
153-
- name: Verbalize images
154-
href:
155145
- name: Vectorize from a structured document layout
156-
href:
146+
href: tutorial-document-layout-multimodal-embeddings.md
157147
- name: Verbalize images from a structured document layout
158-
href:
148+
href: tutorial-document-layout-image-verbalization.md
159149
- name: RAG tutorials
160150
items:
161151
- name: Build a RAG solution

articles/search/tutorial-document-extraction-image-verbalization.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Tutorial: Use Image Verbalization and Document Extraction Skill for Multimodal Indexing'
2+
title: 'Tutorial: Verbalize images using generative AI'
33
titleSuffix: Azure AI Search
44
description: Learn how to extract, index, and search multimodal content using the Document Extraction skill for chunking and GenAI Prompt skill for image verbalizations.
55

@@ -14,7 +14,7 @@ ms.date: 05/29/2025
1414

1515
---
1616

17-
# Tutorial: Index mixed content using image verbalizations and the Document Extraction skill
17+
# Tutorial: Verbalize images using generative AI
1818

1919
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline by describing visual content in natural language and embedding it alongside document text.
2020

@@ -32,7 +32,7 @@ In this tutorial, you use:
3232

3333
This tutorial demonstrates a lower-cost approach for indexing multimodal content using Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it doesn't include locational metadata for text, such as page numbers or bounding regions.
3434

35-
For a more comprehensive solution that includes structured text layout and spatial metadata, see [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md).
35+
For a more comprehensive solution that includes structured text layout and spatial metadata, see [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md).
3636

3737
> [!NOTE]
3838
> Setting `imageAction` to `generateNormalizedImages` is required for this tutorial and incurs an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
@@ -752,4 +752,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
752752
* [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md)
753753
* [Vectors in Azure AI Search](vector-search-overview.md)
754754
* [Semantic ranking in Azure AI Search](semantic-search-overview.md)
755-
* [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md)
755+
* [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)

articles/search/tutorial-document-extraction-multimodal-embeddings.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ In this tutorial, you use:
3030

3131
This tutorial demonstrates a lower-cost approach for indexing multimodal content using Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it doesn't include locational metadata for text, such as page numbers or bounding regions.
3232

33-
For a more comprehensive solution that includes structured text layout and spatial metadata, see [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md).
33+
For a more comprehensive solution that includes structured text layout and spatial metadata, see [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md).
3434

3535
> [!NOTE]
3636
> Setting `imageAction` to `generateNormalizedImages` results in image extraction, which is an extra charge. For more information, see [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/) for image extraction pricing.
@@ -710,4 +710,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
710710
* [AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md)
711711
* [Vectors in Azure AI Search](vector-search-overview.md)
712712
* [Semantic ranking in Azure AI Search](semantic-search-overview.md)
713-
* [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md)
713+
* [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)

articles/search/tutorial-document-layout-image-verbalization.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Tutorial: Use Image Verbalization and Document Layout Skill for Multimodal Indexing'
2+
title: 'Tutorial: Verbalize images from a structured document layout'
33
titleSuffix: Azure AI Search
44
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and GenAI Prompt skill for image verbalizations.
55

@@ -14,7 +14,7 @@ ms.date: 05/29/2025
1414

1515
---
1616

17-
# Tutorial: Index mixed content using image verbalizations and the Document Layout skill
17+
# Tutorial: Verbalize images from a structured document layout
1818

1919
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
2020

@@ -26,7 +26,7 @@ In this tutorial, you use:
2626

2727
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
2828

29-
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md).
29+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md).
3030

3131
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions, which are text-based descriptions of visual content, for search and grounding.
3232

articles/search/tutorial-document-layout-multimodal-embeddings.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Tutorial: Use Multimodal Embeddings and Document Layout Skill for Multimodal Indexing'
2+
title: 'Tutorial: Vectorize from a structured document layout'
33
titleSuffix: Azure AI Search
44
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and Azure AI Vision for embeddings.
55

@@ -14,7 +14,7 @@ ms.date: 06/11/2025
1414

1515
---
1616

17-
# Tutorial: Index mixed content using multimodal embeddings and the Document Layout skill
17+
# Tutorial: Vectorize from a structured document layout
1818

1919
<!-- Multimodal plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. -->
2020
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure, and uses a multimodal embedding model to vectorize text and images in a searchable index.
@@ -25,7 +25,7 @@ In this tutorial, you use:
2525

2626
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
2727

28-
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md).
28+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md).
2929

3030
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
3131

@@ -614,4 +614,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
614614
+ [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)
615615
+ [Vectors in Azure AI Search](vector-search-overview.md)
616616
+ [Semantic ranking in Azure AI Search](semantic-search-overview.md)
617-
+ [Index multimodal content using embeddings and Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md)
617+
+ [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md)

0 commit comments

Comments
 (0)