Skip to content

Commit 63459d5

Browse files
Merge pull request #4738 from HeidiSteen/heidist-rb-rag
consistency and differentiation among the four multimodal tutorials
2 parents 74ea1e4 + 18e117d commit 63459d5

6 files changed

+52
-51
lines changed

articles/search/search-indexer-access-control-lists-and-role-based-access.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ This article supplements [**Index data from ADLS Gen2**](search-howto-index-azu
5151
+ [Knowledge store](knowledge-store-concept-intro.md)
5252
+ [Indexer enrichment cache](search-howto-incremental-index.md)
5353
+ [Debug sessions](cognitive-search-debug-session.md)
54-
+ One-to-many [parsing modes](/rest/api/searchservice/indexers/create?view=rest-searchservice-2025-05-01-preview&tabs=HTTP#blobindexerparsingmode), such as: `delimitedText`, `jsonArray`, `jaonLines`, and `markdown` with sub-mode `oneToMany`
54+
+ One-to-many [parsing modes](/rest/api/searchservice/indexers/create?view=rest-searchservice-2025-05-01-preview&preserve-view=true#blobindexerparsingmode), such as: `delimitedText`, `jsonArray`, `jaonLines`, and `markdown` with sub-mode `oneToMany`
5555

5656
## About ACL hierarchical permissions
5757

articles/search/tutorial-multimodal-index-embeddings-skill.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: 'Tutorial: Index multimodal content using multimodal embedding and document layout skill'
33
titleSuffix: Azure AI Search
4-
description: Learn how to extract, index, and search both text and images from Azure Blob Storage for multimodal scenarios using the Azure AI Search REST APIs.
4+
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and Azure AI Vision for embeddings.
55

66
manager: arjagann
77
author: rawan
@@ -13,26 +13,24 @@ ms.date: 05/05/2025
1313

1414
---
1515

16-
# Tutorial: Index multimodal content using multimodal embedding and document layout skill
16+
# Tutorial: Index mixed content using multimodal embeddings and the Document Layout skill
1717

18-
Multimodal plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. In this Azure AI Search tutorial, learn how to build a multimodal retrieval pipeline that chunks data based on document structure, and uses a multimodal embedding model to vectorize text and images in a searchable index.
18+
<!-- Multimodal plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. -->
19+
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure, and uses a multimodal embedding model to vectorize text and images in a searchable index.
1920

20-
You’ll work with a 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text. Using the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)(currently in public preview), you’ll extract both text and normalized images with its locationMetadata. Each modality is then embedded using the same [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates dense vector representations suitable for semantic and hybrid search scenarios.
21+
In this tutorial, you use:
2122

22-
You'll use:
23+
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
2324

24-
+ The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images.
25+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
2526

26-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
27+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
2728

28-
+ A search index configured to store text and image embeddings and support vector-based similarity search.
29+
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
2930

30-
This tutorial demonstrates a solution for indexing multi-modal content using Document Layout skill. Document Layout skill
31-
enables extraction both text and image with its locational metadata from various documents, such as page numbers or bounding regions. However, [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day
31+
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
3232

33-
For a lower-cost solution that indexing multi-modal content, see [Index multi-modal content using embedding and document extraction skill](https://aka.ms/azs-multimodal).
34-
35-
This tutorial shows you how to index such data, using a REST client and the [Search REST APIs](/rest/api/searchservice/) to:
33+
Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you will:
3634

3735
> [!div class="checklist"]
3836
> + Set up sample data and configure an `azureblob` data source
@@ -614,4 +612,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
614612
+ [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)
615613
+ [Vectors in Azure AI Search](vector-search-overview.md)
616614
+ [Semantic ranking in Azure AI Search](semantic-search-overview.md)
617-
+ [Index multi-modal content using embedding and document extraction skill](https://aka.ms/azs-multimodal)
615+
+ [Index multimodal content using embedding and document extraction skill](https://aka.ms/azs-multimodal)

articles/search/tutorial-multimodal-index-image-verbalization-skill.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: 'Tutorial: Index multimodal content using image verbalization and document layout skill'
33
titleSuffix: Azure AI Search
4-
description: Learn how to extract, describe, and index text and images from Azure Blob Storage using GenAI Prompt skill and Azure AI Search REST APIs to support multimodal scenarios.
4+
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and GenAI Prompt skill for image verbalizations.
55

66
manager: arjagann
77
author: rawan
@@ -13,28 +13,30 @@ ms.date: 05/05/2025
1313

1414
---
1515

16-
# Tutorial: Index multimodal content using image verbalization and document layout skill
16+
# Tutorial: Index mixed content using image verbalizations and the Document Layout skill
1717

18-
Multi-modality plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. "In this Azure AI Search tutorial, learn how to build a multimodal retrieval pipeline that that chunks data based on document structure, and =uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
18+
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that that chunks data based on document structure, and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
1919

20-
You’ll work with a 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text. Using the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)(currently in public preview), you’ll extract both text and normalized images with its locationMetadata. Each image is passed to the [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md) (currently in public preview) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities—text and verbalized images.
20+
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities—text and verbalized images.
2121

22-
You'll use:
22+
In this tutorial, you use:
2323

24-
+ The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images.
25-
+ The [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md) to generate image captions — text-based descriptions of visual content — for search and grounding.
26-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
27-
+ A search index configured to store text and image embeddings and support vector-based similarity search.
24+
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
25+
26+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
27+
28+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
2829

29-
This tutorial demonstrates a solution for indexing multi-modal content using Document Layout skill. Document Layout skill
30-
enables extraction both text and image with its locational metadata from various documents, such as page numbers or bounding regions. However, [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day
30+
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions — text-based descriptions of visual content — for search and grounding.
31+
32+
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
3133

32-
For a lower-cost solution that indexing multi-modal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
34+
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
3335

3436
> [!NOTE]
35-
> Setting `imageAction` to `generateNormalizedImages` as is required for this tutorial will incur an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
37+
> Setting `imageAction` to `generateNormalizedImages` is required for this tutorial and incurs an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
3638
37-
Using a REST client and the [Search REST APIs](/rest/api/searchservice/) you will:
39+
Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you will:
3840

3941
> [!div class="checklist"]
4042
> + Set up sample data and configure an `azureblob` data source

articles/search/tutorial-multimodal-indexing-with-embedding-and-doc-extraction.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: 'Tutorial: Index multimodal content using embedding and document extraction skill'
33
titleSuffix: Azure AI Search
4-
description: Learn how to extract, index, and search both text and images from Azure Blob Storage for multimodal scenarios using the Azure AI Search REST APIs.
4+
description: Learn how to extract, index, and search multimodal content using the Document Extracction skill for chunking and Azure AI Vision for embeddings.
55

66
manager: arjagann
77
author: mdonovan
@@ -13,19 +13,19 @@ ms.date: 05/01/2025
1313

1414
---
1515

16-
# Tutorial: Index multimodal content using embedding and document extraction skill
16+
# Tutorial: Index mixed content using multimodal embeddings and the Document Extraction skill
1717

18-
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows how to build a multimodal retrieval pipeline by embedding both text and images into a unified semantic search index.
18+
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline by embedding both text and images into a unified semantic search index.
1919

20-
You’ll work with a 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text. Using the [Document Extraction skill](cognitive-search-skill-document-extraction.md), you’ll extract both text and normalized images. Each modality is then embedded using the same [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates dense vector representations suitable for semantic and hybrid search scenarios.
20+
In this tutorial, you use:
2121

22-
You'll use:
22+
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
2323

2424
+ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting text and normalized images.
2525

26-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
26+
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
2727

28-
+ A search index configured to store text and image embeddings and support vector-based similarity search.
28+
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
2929

3030
This tutorial demonstrates a lower-cost approach for indexing multimodal content using Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it does not include locational metadata for text, such as page numbers or bounding regions.
3131

@@ -34,7 +34,7 @@ For a more comprehensive solution that includes structured text layout and spati
3434
> [!NOTE]
3535
> Setting `imageAction` to `generateNormalizedImages` as is required for this tutorial will incur an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
3636
37-
This tutorial shows you how to index such data, using a REST client and the [Search REST APIs](/rest/api/searchservice/) to:
37+
Using a REST client and the [Search REST APIs](/rest/api/searchservice/) you will:
3838

3939
> [!div class="checklist"]
4040
> + Set up sample data and configure an `azureblob` data source

0 commit comments

Comments
 (0)