Skip to content

Commit 4870a10

Browse files
authored
Merge pull request #4731 from gmndrg/release-build-azure-search
Adding new document-level security and multimodal hub documentation
2 parents 5a4f7d8 + acf135c commit 4870a10

File tree

3 files changed

+151
-0
lines changed

3 files changed

+151
-0
lines changed
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
title: Multimodal search concepts and guidance in Azure AI Search
3+
titleSuffix: Azure AI Search
4+
description: Learn what multimodal search is, how Azure AI Search supports it for text + image content, and where to find detailed concepts, tutorials, and samples.
5+
ms.service: azure-ai-search
6+
ms.topic: conceptual
7+
ms.date: 05/11/2025
8+
author: gmndrg
9+
ms.author: gimondra
10+
---
11+
12+
# Multimodal search in Azure AI Search
13+
14+
Multimodal search refers to the ability to ingest, understand, and retrieve content across multiple data types, including text, images, and other modalities such as video and audio. In Azure AI Search, multimodal search natively supports the ingestion of documents containing text and images, and the retrieval of their content, enabling users to perform searches that combine these modalities. In practice, this capability means an application using multimodal search can answer a question such as, "What is the process to have an HR form approved?" even when the only authoritative description of the workflow lives inside an embedded diagram of a PDF file.
15+
16+
Diagrams, scanned forms, screenshots, and infographics often contain the decisive details that make or break an answer. Multimodal search helps close the gap by integrating visual content into the same retrieval pipeline as text. This approach reduces the likelihood that your AI agent or RAG application might overlook important images and enables your users to trace every provided answer back to its original source.
17+
18+
Building a robust multimodal pipeline typically involves several key steps. These steps include extracting inline images and page text, describing images in natural language, embedding both text and images into a shared vector space, and storing the images for later use as annotations. Multimodal search also requires preserving the order of information as it appears in the document and executing [hybrid queries](hybrid-search-overview.md) that combine [full text search](search-lucene-query-architecture.md) with [vector search](vector-search-overview.md) and [semantic ranking](semantic-search-overview.md).
19+
20+
Azure AI Search simplifies the construction of a multimodal pipeline through a guided experience in the Azure portal:
21+
22+
1. [Azure portal multimodal functionality](search-get-started-portal-image-search.md): The step-by-step multimodal functionality in the "Import and vectorize data" wizard helps configure your data source, extraction and enrichment settings, and generate a multimodal index containing text, embedded image references, and vector embeddings.
23+
1. [Reference GitHub multimodal RAG application sample](https://aka.ms/azs-multimodal-sample-app-repo): A companion GitHub repository with sample code. The sample demonstrates how a [Retrieval Augmented Generation (RAG)](retrieval-augmented-generation-overview.md) application consumes a multimodal index and renders both textual citations and associated image snippets in the response. The repository also showcases the full process of data ingestion and indexing through code, providing developers with a programmatic alternative to the Azure portal wizard.
24+
25+
## Functionality enabling multimodality
26+
27+
The functionality behind the "Import and vectorize data" wizard's multimodality option is powered by managed, configurable AI skills and the Azure Search knowledge store:
28+
29+
+ [Document Intelligence layout skill](cognitive-search-skill-document-intelligence-layout.md) and [document extraction skill](cognitive-search-skill-document-extraction.md) obtain page text, inline images, and structural metadata. The Document Extraction skill doesn't support polygon extraction or page number extraction. Also, the range of supported file types may vary. To ensure optimal alignment with your specific use case, check each skill documentation for detailed information on compatibility and capabilities.
30+
+ [Split skill](cognitive-search-skill-textsplit.md) chunks the extracted text for utilization in the remaining pipeline functionality (such as embedding skills).
31+
+ [Gen AI prompt skill](cognitive-search-skill-genai-prompt.md) verbalizes images, producing concise natural-language descriptions suitable for text search and embedding using a Large Language Model (LLM).
32+
+ Text/image (or multimodal) embedding skills create embeddings for text and images, enabling similarity and hybrid retrieval. You can call [Azure OpenAI](cognitive-search-skill-azure-openai-embedding.md), [AI Foundry](cognitive-search-aml-skill.md), or [AI Vision](cognitive-search-skill-vision-vectorize.md) embedding models natively.
33+
+ [Knowledge store](knowledge-store-concept-intro.md) stores extracted images that can be returned directly to client applications. When you use the 'Import and vectorize data' wizard with the multimodality option, an image's location is stored directly within the index, enabling convenient retrieval at a query time.
34+
35+
36+
## Selecting an ingestion skill
37+
38+
A multimodal pipeline begins by cracking each source document into chunks of text, inline images, and associated metadata. Azure AI Search provides two built-in skills for this step. Both enable textual and image extraction, but they differ in the layout detail and metadata they return, and in how their billing works.
39+
40+
| Characteristic | Document Intelligence layout skill | Document extraction skill |
41+
|----------------|------------------------------------|---------------------------|
42+
| Location metadata extraction (page, bounding polygon) | Yes | No |
43+
| Data-extraction billing | Billed according to [Document Intelligence layout-model pricing](https://azure.microsoft.com/pricing/details/ai-document-intelligence/). | Image extraction is billed as outlined in the [Azure AI Search pricing page](https://azure.microsoft.com/pricing/details/search/). |
44+
| Recommended scenarios | RAG pipelines and agent workflows that need precise page numbers, on-page highlights, or diagram overlays in client apps. | Rapid prototyping or production pipelines where the exact position or detailed layout information isn't required. |
45+
46+
You can also call directly [Content Understanding](/azure/ai-services/content-understanding/concepts/retrieval-augmented-generation) for multimodality content extraction purposes using a [custom skill](cognitive-search-custom-skill-web-api.md) since it isn't supported natively yet in Azure AI Search.
47+
48+
## Choosing an embedding strategy: image verbalization or direct embeddings
49+
Retrieving knowledge from images can follow two complementary paths in Azure AI Search. Understanding the distinctions helps you align cost, latency, and answer quality with the needs of your application.
50+
51+
### Image verbalization followed by text embeddings
52+
With this method, the Gen AI prompt skill invokes an LLM during ingestion to create a concise natural-language description of each extracted image—for example "Five-step HR access workflow that begins with manager approval." The description is stored as text and embedded alongside the surrounding document text. Because the image is now expressed in language, Azure AI Search can:
53+
54+
- Interpret the relationships and entities shown in a diagram.
55+
- Supply ready-made captions that an LLM can cite verbatim in a response.
56+
- Return relevant snippets for RAG applications/AI agent scenarios with grounded data.
57+
58+
The added semantic depth entails an LLM call for every image and a marginal increase in indexing time.
59+
60+
### Direct vision–text embeddings
61+
A second option is to pass the document extracted images and text to a multimodal embedding model that produces vector representations in the same vector space. Configuration is straightforward and no LLM is required at indexing time. Direct embeddings are well suited to visual similarity and “find-me-something-that-looks-like-this” scenarios.
62+
63+
Because the representation is purely mathematical, it doesn't convey why two images are related, and it offers the LLM no ready context for citations or detailed explanations.
64+
65+
### Combining both approaches
66+
Many solutions need both encoding paths. Diagrams, flow charts, and other explanation-rich visuals are verbalized so that semantic information is available for RAG and AI agent grounding. Screenshots, product photos, or artwork are embedded directly for efficient similarity search. You can customize your Azure AI Search index and indexer skillset pipeline so it can store the two sets of vectors and retrieve them side by side.
67+
68+
69+
### Tutorials and samples
70+
71+
To help you get started with multimodal search in Azure AI Search, here's a collection of tutorials and samples that demonstrate how to create and optimize multimodal indexes using Azure functionalities and capabilities.
72+
73+
| Tutorial / sample | Description |
74+
| ---------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
75+
| [Quickstart: Multimodal search in the Azure portal](search-get-started-portal-image-search.md) | Create and test a multimodal index in the Azure portal using the wizard and Search Explorer. |
76+
| [Tutorial: Image verbalization + document extraction](tutorial-multimodal-indexing-with-image-verbalization-and-doc-extraction.md) | Extract text and images, verbalize diagrams, and embed the resulting descriptions and text into a searchable index. |
77+
| [Tutorial: Multimodal embeddings + document extraction](tutorial-multimodal-indexing-with-embedding-and-doc-extraction.md) | Use a vision-text model to embed both text and images directly, enabling visual-similarity search over scanned PDFs. |
78+
| [Tutorial: Image verbalization + layout skill](tutorial-multimodal-index-image-verbalization-skill.md) | Apply layout-aware chunking and diagram verbalization, capture location metadata, and store cropped images for precise citations and page highlights. |
79+
| [Tutorial: Multimodal embeddings + layout skill](tutorial-multimodal-index-embeddings-skill.md) | Combine layout-aware chunking with unified embeddings for hybrid semantic + keyword search that returns exact hit locations. |
80+
| [Sample app: Multimodal RAG GitHub repository](https://aka.ms/azs-multimodal-sample-app-repo) | An end-to-end RAG application code with multimodal capabilities that surfaces both text snippets and image annotations—ideal for jump-starting enterprise copilots. |
81+
82+
83+
84+
85+
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Document-level access control
3+
titleSuffix: Azure AI Search
4+
description: Conceptual overview of document-level permissions in Azure AI Search.
5+
ms.service: azure-ai-search
6+
ms.topic: conceptual
7+
ms.date: 05/10/2025
8+
author: gmndrg
9+
ms.author: gimondra
10+
---
11+
12+
# Document-level access control in Azure AI Search
13+
14+
Azure AI Search offers support for document-level access control, enabling organizations to enforce fine-grained permissions seamlessly, from data ingestion through query execution. This capability is essential for building secure AI agentic systems grounding data, Retrieval-Augmented Generation (RAG) applications, and enterprise search solutions while maintaining compliance and user trust.
15+
16+
Document-level access helps restrict content visibility to authorized users, based on predefined access rules. Azure AI Search supports this functionality through multiple approaches, providing flexibility for integration.
17+
18+
## Overview of document-level access control features
19+
20+
Azure AI Search provides document-level access control in the following ways:
21+
22+
### Native support for integration with Microsoft Entra-based POSIX-style Access Control List (ACL) systems (preview)
23+
24+
#### Retrieving permissions metadata during data ingestion process
25+
Azure AI Search enables you to push document permissions directly into the search index alongside the content, enabling consistent application of access rules at query time. This capability is achieved in two ways:
26+
27+
- Use the [REST API](/rest/api/searchservice/operation-groups) or supported SDKs to [push documents and their associated permission metadata](search-index-access-control-lists-and-rbac-push-api.md)into the search index. This approach is ideal for systems with [Microsoft Entra](/Entra/fundamentals/what-is-Entra)-based [Access Control Lists (ACLs)](/azure/storage/blobs/data-lake-storage-access-control) and [Role-based access control (RBAC) roles](/azure/role-based-access-control/overview), such as [Azure Data Lake Storage (ADLS) Gen2](/azure/storage/blobs/data-lake-storage-introduction). By embedding ACLs and RBAC container metadata within the index, developers can reduce the need for custom security trimming logic during query execution.
28+
29+
-For [built-in ADLS Gen2 indexers](search-indexer-access-control-lists-and-role-based-access.md), you can use the preview REST API with the permission filter options to flow existing ACLs and RBAC permissions to your search index. This indexer pulls ACLs and RBAC roles at container level during the data ingestion process, enabling a low/no-code workflow for managing document-level permissions.
30+
31+
#### Enforcing document-level permissions at query time
32+
With native [token-based querying](https://aka.ms/azs-query-preserving-permissions), Azure AI Search validates a user's [Microsoft Entra token](/Entra/identity/devices/concept-tokens-microsoft-Entra-id) to enforce ACLs and RBAC roles automatically. This functionality helps trim result sets to include only documents the user is authorized to access. You can achieve automatic trimming by attaching the user's Microsoft Entra token to your query request.
33+
34+
35+
### Security trimming via filters
36+
37+
For scenarios where native ACL and RBAC integration isn't supported, Azure AI Search enables [security trimming using query filters](search-security-trimming-for-azure-search.md). By creating a field in the index to represent user or group identities, you can use the filters to include or exclude documents from query results based on those identities. This approach is useful for systems with custom access models or non-Microsoft Entra-based security frameworks.
38+
39+
## Benefits of document-level access control
40+
41+
Document-level access control is critical for safeguarding sensitive information in AI-driven applications. It helps organizations build systems that align with their access policies, reducing the risk of exposing unauthorized or confidential data. By integrating access rules directly into the search pipeline, AI systems can provide responses grounded in secure and authorized information.
42+
43+
By offloading permission enforcement to Azure AI Search, developers can focus on building high-quality retrieval and ranking systems. This approach helps reducing the need to handle nested groups, write custom filters, or manually trim search results.
44+
45+
Document-level permissions in Azure AI Search provide a structured framework for enforcing access controls that align with organizational policies. By using Microsoft Entra-based ACLs and RBAC roles, organizations can create systems that support robust compliance and promote trust among users. These built-in capabilities reduce the need for custom coding, offering a standardized approach to document-level security.
46+
47+
## Reference documents
48+
49+
To help you dive deeper into document-level access control in Azure AI Search, here’s a table of key resources:
50+
51+
| Functionality | Reference |
52+
|---|---|
53+
| **Index permissions using REST API** | [Index permissions using REST API](search-index-access-control-lists-and-rbac-push-api.md) |
54+
| **Index ADLS Gen2 permissions metadata using built-in indexers** | [Index permissions using ADLS Gen2 indexer](search-indexer-access-control-lists-and-role-based-access.md) |
55+
| **Query using Microsoft Entra token-based permissions** | [Query using Microsoft Entra token-based permissions](https://aka.ms/azs-query-preserving-permissions) |
56+
| **Security trimming via filters** | [Security trimming via filters](search-security-trimming-for-azure-search.md) |
57+
58+
59+
60+
## Next steps
61+
62+
- [Tutorial: Index ADLS Gen2 permissions metadata](tutorial-adls-gen2-indexer-acls.md)

articles/search/toc.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,8 @@ items:
164164
href: search-indexer-overview.md
165165
- name: Applied AI
166166
items:
167+
- name: Multimodal search
168+
href: multimodal-search-overview.md
167169
- name: Built-in vectorization
168170
href: vector-search-integrated-vectorization.md
169171
- name: AI enrichment during indexing
@@ -207,6 +209,8 @@ items:
207209
href: ./security-controls-policy.md
208210
- name: Security baseline
209211
href: /security/benchmark/azure/baselines/cognitive-search-security-baseline?toc=/azure/search/TOC.json
212+
- name: Document-level security
213+
href: search-document-level-access-overview.md
210214
- name: How-to guides
211215
items:
212216
- name: Service management

0 commit comments

Comments
 (0)