You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+ One-to-many [parsing modes](/rest/api/searchservice/indexers/create?view=rest-searchservice-2025-05-01-preview&tabs=HTTP#blobindexerparsingmode), such as: `delimitedText`, `jsonArray`, `jaonLines`, and `markdown` with sub-mode `oneToMany`
54
+
+ One-to-many [parsing modes](/rest/api/searchservice/indexers/create?view=rest-searchservice-2025-05-01-preview&preserve-view=true#blobindexerparsingmode), such as: `delimitedText`, `jsonArray`, `jaonLines`, and `markdown` with sub-mode `oneToMany`
Copy file name to clipboardExpand all lines: articles/search/tutorial-multimodal-index-embeddings-skill.md
+12-14Lines changed: 12 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: 'Tutorial: Index multimodal content using multimodal embedding and document layout skill'
3
3
titleSuffix: Azure AI Search
4
-
description: Learn how to extract, index, and search both text and images from Azure Blob Storage for multimodal scenarios using the Azure AI Search REST APIs.
4
+
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and Azure AI Vision for embeddings.
5
5
6
6
manager: arjagann
7
7
author: rawan
@@ -13,26 +13,24 @@ ms.date: 05/05/2025
13
13
14
14
---
15
15
16
-
# Tutorial: Index multimodal content using multimodal embedding and document layout skill
16
+
# Tutorial: Index mixed content using multimodal embeddings and the Document Layout skill
17
17
18
-
Multimodal plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. In this Azure AI Search tutorial, learn how to build a multimodal retrieval pipeline that chunks data based on document structure, and uses a multimodal embedding model to vectorize text and images in a searchable index.
18
+
<!-- Multimodal plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. -->
19
+
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure, and uses a multimodal embedding model to vectorize text and images in a searchable index.
19
20
20
-
You’ll work with a 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text. Using the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)(currently in public preview), you’ll extract both text and normalized images with its locationMetadata. Each modality is then embedded using the same [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates dense vector representations suitable for semantic and hybrid search scenarios.
21
+
In this tutorial, you use:
21
22
22
-
You'll use:
23
+
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
23
24
24
-
+ The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images.
25
+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
25
26
26
-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
27
+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
27
28
28
-
+A search index configured to store text and image embeddings and support vector-based similarity search.
29
+
+Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
29
30
30
-
This tutorial demonstrates a solution for indexing multi-modal content using Document Layout skill. Document Layout skill
31
-
enables extraction both text and image with its locational metadata from various documents, such as page numbers or bounding regions. However, [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day
31
+
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
32
32
33
-
For a lower-cost solution that indexing multi-modal content, see [Index multi-modal content using embedding and document extraction skill](https://aka.ms/azs-multimodal).
34
-
35
-
This tutorial shows you how to index such data, using a REST client and the [Search REST APIs](/rest/api/searchservice/) to:
33
+
Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you will:
36
34
37
35
> [!div class="checklist"]
38
36
> + Set up sample data and configure an `azureblob` data source
@@ -614,4 +612,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
Copy file name to clipboardExpand all lines: articles/search/tutorial-multimodal-index-image-verbalization-skill.md
+16-14Lines changed: 16 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: 'Tutorial: Index multimodal content using image verbalization and document layout skill'
3
3
titleSuffix: Azure AI Search
4
-
description: Learn how to extract, describe, and index text and images from Azure Blob Storage using GenAI Prompt skill and Azure AI Search REST APIs to support multimodal scenarios.
4
+
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and GenAI Prompt skill for image verbalizations.
5
5
6
6
manager: arjagann
7
7
author: rawan
@@ -13,28 +13,30 @@ ms.date: 05/05/2025
13
13
14
14
---
15
15
16
-
# Tutorial: Index multimodal content using image verbalization and document layout skill
16
+
# Tutorial: Index mixed content using image verbalizations and the Document Layout skill
17
17
18
-
Multi-modality plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. "In this Azure AI Search tutorial, learn how to build a multimodal retrieval pipeline that that chunks data based on document structure, and =uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
18
+
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that that chunks data based on document structure, and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
19
19
20
-
You’ll work with a 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text. Using the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)(currently in public preview), you’ll extract both text and normalized images with its locationMetadata. Each image is passed to the [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md) (currently in public preview) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities—text and verbalized images.
20
+
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities—text and verbalized images.
21
21
22
-
You'll use:
22
+
In this tutorial, you use:
23
23
24
-
+ The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images.
25
-
+ The [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md) to generate image captions — text-based descriptions of visual content — for search and grounding.
26
-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
27
-
+ A search index configured to store text and image embeddings and support vector-based similarity search.
24
+
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
25
+
26
+
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
27
+
28
+
The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution that indexing multimodal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
28
29
29
-
This tutorial demonstrates a solution for indexing multi-modal content using Document Layout skill. Document Layout skill
30
-
enables extraction both text and image with its locational metadata from various documents, such as page numbers or bounding regions. However, [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited region availability and is bound to Azure AI services and requires [a billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day
30
+
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions — text-based descriptions of visual content — for search and grounding.
31
+
32
+
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
31
33
32
-
For a lower-cost solution that indexing multi-modal content, see [Index multimodal content using image verbalization and document extraction skill](https://aka.ms/azs-multimodal).
34
+
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
33
35
34
36
> [!NOTE]
35
-
> Setting `imageAction` to `generateNormalizedImages`as is required for this tutorial will incur an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
37
+
> Setting `imageAction` to `generateNormalizedImages` is required for this tutorial and incurs an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
36
38
37
-
Using a REST client and the [Search REST APIs](/rest/api/searchservice/) you will:
39
+
Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you will:
38
40
39
41
> [!div class="checklist"]
40
42
> + Set up sample data and configure an `azureblob` data source
Copy file name to clipboardExpand all lines: articles/search/tutorial-multimodal-indexing-with-embedding-and-doc-extraction.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: 'Tutorial: Index multimodal content using embedding and document extraction skill'
3
3
titleSuffix: Azure AI Search
4
-
description: Learn how to extract, index, and search both text and images from Azure Blob Storage for multimodal scenarios using the Azure AI Search REST APIs.
4
+
description: Learn how to extract, index, and search multimodal content using the Document Extracction skill for chunking and Azure AI Vision for embeddings.
5
5
6
6
manager: arjagann
7
7
author: mdonovan
@@ -13,19 +13,19 @@ ms.date: 05/01/2025
13
13
14
14
---
15
15
16
-
# Tutorial: Index multimodal content using embedding and document extraction skill
16
+
# Tutorial: Index mixed content using multimodal embeddings and the Document Extraction skill
17
17
18
-
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows how to build a multimodal retrieval pipeline by embedding both text and images into a unified semantic search index.
18
+
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline by embedding both text and images into a unified semantic search index.
19
19
20
-
You’ll work with a 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text. Using the [Document Extraction skill](cognitive-search-skill-document-extraction.md), you’ll extract both text and normalized images. Each modality is then embedded using the same [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates dense vector representations suitable for semantic and hybrid search scenarios.
20
+
In this tutorial, you use:
21
21
22
-
You'll use:
22
+
+ A 36-page PDF document that combines rich visual content—such as charts, infographics, and scanned pages—with traditional text.
23
23
24
24
+ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting text and normalized images.
25
25
26
-
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings from both text and images. The same skill is used for both modalities, with text inputs processed into embeddings for semantic search, and images processed into vector representations using Azure AI Vision models.
26
+
+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
27
27
28
-
+ A search index configured to store text and image embeddings and support vector-based similarity search.
28
+
+ A search index configured to store text and image embeddings and support for vector-based similarity search.
29
29
30
30
This tutorial demonstrates a lower-cost approach for indexing multimodal content using Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it does not include locational metadata for text, such as page numbers or bounding regions.
31
31
@@ -34,7 +34,7 @@ For a more comprehensive solution that includes structured text layout and spati
34
34
> [!NOTE]
35
35
> Setting `imageAction` to `generateNormalizedImages` as is required for this tutorial will incur an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
36
36
37
-
This tutorial shows you how to index such data, using a REST client and the [Search REST APIs](/rest/api/searchservice/)to:
37
+
Using a REST client and the [Search REST APIs](/rest/api/searchservice/)you will:
38
38
39
39
> [!div class="checklist"]
40
40
> + Set up sample data and configure an `azureblob` data source
0 commit comments