Skip to content

Commit 4cd3d9f

Browse files
Merge pull request #6512 from HeidiSteen/heidist-freshness
AR revs
2 parents 48c8cad + dea07c6 commit 4cd3d9f

File tree

1 file changed

+84
-33
lines changed

1 file changed

+84
-33
lines changed

articles/search/search-agentic-retrieval-concept.md

Lines changed: 84 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn about agentic retrieval concepts, architecture, and use cases
55
author: HeidiSteen
66
ms.author: heidist
77
manager: nitinme
8-
ms.date: 06/08/2025
8+
ms.date: 08/11/2025
99
ms.service: azure-ai-search
1010
ms.update-cycle: 90-days
1111
ms.topic: concept-article
@@ -18,15 +18,23 @@ ms.custom:
1818

1919
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
2020

21-
In Azure AI Search, *agentic retrieval* is a new query pipeline designed for complex questions posed by users or agents in chat and copilot apps. It uses a large language model (LLM) to break down a question into smaller subqueries, often using chat history for context. These subqueries run in parallel, each searching for the most relevant content in your index. The results are ranked for semantic relevance, combined, and sent back to your LLM to help generate accurate answers using your proprietary content.
21+
In Azure AI Search, *agentic retrieval* is a new multi-query pipeline designed for complex questions posed by users or agents in chat and copilot apps. It works by:
2222

23-
Programmatically, agentic retrieval is supported through a new Knowledge Agents object in the 2025-05-01-preview data plane REST API and in Azure SDK prerelease packages that provide the feature. A knowledge agent's retrieval response is designed for downstream consumption by other agents and chat apps.
23+
+ Using a large language model (LLM) to break down complex queries into smaller, focused subqueries. You can include chat history for additional context.
24+
25+
+ Running multiple subqueries simultaneously to search your index. Each subquery is semantically reranked to find the most relevant matches.
26+
27+
+ Combining the best results into a unified response that your LLM can use to generate answers with your proprietary content.
28+
29+
This high-performance pipeline helps you return comprehensive answers to complex questions quickly.
30+
31+
Programmatically, agentic retrieval is supported through a new [Knowledge Agents object](/rest/api/searchservice/knowledge-agents?view=rest-searchservice-2025-05-01-preview&preserve-view=true) in the 2025-05-01-preview data plane REST API and in Azure SDK preview packages that provide the feature. A knowledge agent's retrieval response is designed for downstream consumption by other agents and chat apps.
2432

2533
## Why use agentic retrieval
2634

2735
You should use agentic retrieval when you want to provide agents and apps with the most relevant content for answering harder questions, leveraging chat context and your proprietary content.
2836

29-
The *agentic* aspect is a reasoning step in query planning processing that's performed by a supported large language model (LLM) that you provide. The LLM analyzes the entire chat thread to identify the underlying information need. Instead of a single, catch-all query, the model breaks down compound questions into focused subqueries based on: user questions, chat history, and parameters on the request. The subqueries target your indexed documents (plain text and vectors) in Azure AI Search.This hybrid approach ensures you surface both keyword matches and semantic similarities at once, dramatically improving recall.
37+
The *agentic* aspect is a reasoning step in query planning processing that's performed by a supported large language model (LLM) that you provide. The LLM analyzes the entire chat thread to identify the underlying information need. Instead of a single, catch-all query, the LLM breaks down compound questions into focused subqueries based on: user questions, chat history, and parameters on the request. The subqueries target your indexed documents (plain text and vectors) in Azure AI Search. This hybrid approach ensures you surface both keyword matches and semantic similarities at once, dramatically improving recall.
3038

3139
The *retrieval* component is the ability to run subqueries simultaneously, merge results, semantically rank results, and return a three-part response that includes grounding data for the next conversation turn, reference data so that you can inspect the source content, and an activity plan that shows query execution steps.
3240

@@ -48,12 +56,24 @@ Agentic retrieval invokes the entire query processing pipeline multiple times fo
4856
> [!NOTE]
4957
> Including an LLM in query planning adds latency to a query pipeline. You can mitigate the effects by using faster models, such as gpt-4o-mini, and summarizing the message threads. Nonetheless, you should expect longer query times with this pipeline.
5058
51-
## Agentic retrieval architecture
59+
<!-- ## Architecture and components
5260
5361
Agentic retrieval is designed for a conversational search experience that includes an LLM. An important part of agentic retrieval is how the LLM breaks down an initial query into subqueries, which are more effective at locating the best matches in your index.
5462
5563
:::image type="content" source="media/agentic-retrieval/agentic-retrieval-architecture.png" alt-text="Diagram of agentic retrieval workflow using an example query." lightbox="media/agentic-retrieval/agentic-retrieval-architecture.png" :::
5664
65+
The workflow includes:
66+
67+
*Query planning* where the search engine calls an LLM (a chat completion model) that you provide. The output is one or more subqueries. This step is mostly internal. You can review the subqueries that are generated, but query planning isn't intended to be customizable or configurable.
68+
69+
*Query execution* is a parallel process, with L1 ranking for vector and keyword search, and L2 semantic reranking of the L1 results. In agentic retrieval, semantic ranker is a required component.
70+
71+
*Merged results* refers to the output, which is a unified string of all results that you can pass directly to an LLM.
72+
73+
Notice that the architecture requires an LLM for query planning. Only supported LLMs can be used for this step. At the end of the pipeline, you can pass the merged results to any model, tool, or agent.
74+
75+
### Components
76+
5777
Agentic retrieval has these components:
5878
5979
| Component | Resource | Usage |
@@ -64,9 +84,8 @@ Agentic retrieval has these components:
6484
| Retrieval engine | Azure AI Search | Executes on the LLM-generated query plan and other parameters, returning a rich response that includes content and query plan metadata. Queries are keyword, vector, and hybrid. Results are merged and ranked. |
6585
| Semantic ranker | Azure AI Search | Provides L2 reranking, promoting the most relevant matches. Semantic ranker is required for agentic retrieval. |
6686
67-
Your solution should include a tool or app that drives the pipeline. An agentic retrieval pipeline concludes with the response object that provides grounding data. Your solution should take it from there, handling the response by passing it to an LLM to generate an answer, which you render inline in the user conversation. For more information about this step, see [Build an agent-to-agent retrieval solution](search-agentic-retrieval-how-to-pipeline.md).
87+
Your solution should include a tool or app that drives the pipeline. An agentic retrieval pipeline concludes with the response object that provides grounding data. Your solution should take it from there, handling the response by passing it to an LLM to generate an answer, which you render inline in the user conversation. For more information about this step, see [Build an agent-to-agent retrieval solution](search-agentic-retrieval-how-to-pipeline.md). -->
6888

69-
<!-- Insert multiquery pipeline diagram here -->
7089
Agentic retrieval has these processes:
7190

7291
+ Requests for agentic retrieval are initiated by calls to a knowledge agent on Azure AI Search.
@@ -78,6 +97,64 @@ Agentic retrieval has these processes:
7897

7998
Your search index determines query execution and any optimizations that occur during query execution. This includes your semantic configuration, as well as optional scoring profiles, synonym maps, analyzers, and normalizers (if you add filters).
8099

100+
## Architecture and workflow
101+
102+
Agentic retrieval is designed for conversational search experiences that use an LLM to intelligently break down complex queries. The system coordinates multiple Azure services to deliver comprehensive search results.
103+
104+
:::image type="content" source="media/agentic-retrieval/agentic-retrieval-architecture.png" alt-text="Diagram of agentic retrieval workflow using an example query." lightbox="media/agentic-retrieval/agentic-retrieval-architecture.png" :::
105+
106+
### How it works
107+
108+
The agentic retrieval process follows three main phases:
109+
110+
1. **Query planning**: A knowledge agent sends your query and conversation history to an LLM (gpt-4o or gpt-4.1 series), which analyzes the context and breaks down complex questions into focused subqueries. This step is automated and not customizable. The number of subqueries depends on what the LLM decides and whether the `maxDocsForReranker` parameter is higher than 50. A new subquery is defined for each 50-document batch sent to semantic ranker.
111+
112+
2. **Query execution**: All subqueries run simultaneously against your search index, using keyword, vector, and hybrid search. Each subquery undergoes semantic reranking to find the most relevant matches. References are extracted and retained for citation purposes.
113+
114+
3. **Result synthesis**: The system merges and ranks all results, then returns a unified response containing grounding data, source references, and execution metadata.
115+
116+
### Required components
117+
118+
| Component | Service | Role |
119+
|-----------|---------|------|
120+
| **LLM** | Azure OpenAI | Creates subqueries from conversation context and later uses grounding data for answer generation |
121+
| **Knowledge agent** | Azure AI Search | Orchestrates the pipeline, connecting to your LLM and managing query parameters |
122+
| **Search index** | Azure AI Search | Stores your searchable content (text and vectors) with semantic configuration |
123+
| **Semantic ranker** | Azure AI Search | Required component that reranks results for relevance (L2 reranking) |
124+
125+
### Integration requirements
126+
127+
Your application drives the pipeline by calling the knowledge agent and handling the response. The pipeline returns grounding data that you pass to an LLM for answer generation in your conversation interface. For implementation details, see [Build an agent-to-agent retrieval solution](search-agentic-retrieval-how-to-pipeline.md).
128+
129+
> [!NOTE]
130+
> Only gpt-4o and gpt-4.1 series models are supported for query planning. You can use any model for final answer generation.
131+
132+
## How to get started
133+
134+
You must use the preview REST APIs or a prerelease Azure SDK package that provides the functionality. At this time, there's no Azure portal or Azure AI Foundry portal support.
135+
136+
Choose any of these options for your next step.
137+
138+
<!-- + Watch this demo. -->
139+
+ [Quickstart article: Run agentic retrieval in Azure AI Search](search-get-started-agentic-retrieval.md). Learn the basic workflow using sample data and a prepared index and queries.
140+
141+
+ Sample code:
142+
143+
+ [Quickstart-Agentic-Retrieval: Python](https://github.com/Azure-Samples/azure-search-python-samples/tree/main/Quickstart-Agentic-Retrieval)
144+
+ [Quickstart-Agentic-Retrieval: .NET](https://github.com/Azure-Samples/azure-search-dotnet-samples/blob/main/quickstart-agentic-retrieval)
145+
+ [Quickstart-Agentic-Retrieval: REST](https://github.com/Azure-Samples/azure-search-rest-samples/tree/main/Quickstart-agentic-retrieval)
146+
+ [End-to-end with Azure AI Search and Azure AI Agent Service](https://github.com/Azure-Samples/azure-search-python-samples/tree/main/agentic-retrieval-pipeline-example)
147+
148+
+ How-to guides for a focused look at development tasks:
149+
150+
+ [Create a knowledge agent](search-agentic-retrieval-how-to-create.md)
151+
+ [Use a knowledge agent to retrieve data](search-agentic-retrieval-how-to-retrieve.md)
152+
+ [Build an agent-to-agent retrieval solution](search-agentic-retrieval-how-to-pipeline.md).
153+
154+
+ REST API reference, [Knowledge Agents](/rest/api/searchservice/knowledge-agents?view=rest-searchservice-2025-05-01-preview&preserve-view=true) and [Knowledge Retrieval](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-05-01-preview&preserve-view=true).
155+
156+
+ [Azure OpenAI Demo](https://github.com/Azure-Samples/azure-search-openai-demo), updated to use agentic retrieval.
157+
81158
## Availability and pricing
82159

83160
Agentic retrieval is available in [all regions that provide semantic ranker](search-region-support.md), on all tiers except the free tier.
@@ -140,32 +217,6 @@ To estimate the semantic ranking costs associated with agentic retrieval, start
140217

141218
Putting it all together, you'd pay about $3.30 for semantic ranking in Azure AI Search, 60 cents for input tokens in Azure OpenAI, and 42 cents for output tokens in Azure OpenAI, for $1.02 for query planning total. The combined cost for the full execution is $4.32.
142219

143-
## How to get started
144-
145-
You must use the preview REST APIs or a prerelease Azure SDK package that provides the functionality. At this time, there's no Azure portal or Azure AI Foundry portal support.
146-
147-
Choose any of these options for your next step.
148-
149-
<!-- + Watch this demo. -->
150-
+ [Quickstart article: Run agentic retrieval in Azure AI Search](search-get-started-agentic-retrieval.md). Learn the basic workflow using sample data and a prepared index and queries.
151-
152-
+ Sample code:
153-
154-
+ [Quickstart-Agentic-Retrieval: Python](https://github.com/Azure-Samples/azure-search-python-samples/tree/main/Quickstart-Agentic-Retrieval)
155-
+ [Quickstart-Agentic-Retrieval: .NET](https://github.com/Azure-Samples/azure-search-dotnet-samples/blob/main/quickstart-agentic-retrieval)
156-
+ [Quickstart-Agentic-Retrieval: REST](https://github.com/Azure-Samples/azure-search-rest-samples/tree/main/Quickstart-agentic-retrieval)
157-
+ [End-to-end with Azure AI Search and Azure AI Agent Service](https://github.com/Azure-Samples/azure-search-python-samples/tree/main/agentic-retrieval-pipeline-example)
158-
159-
+ How-to guides for a focused look at development tasks:
160-
161-
+ [Create a knowledge agent](search-agentic-retrieval-how-to-create.md)
162-
+ [Use a knowledge agent to retrieve data](search-agentic-retrieval-how-to-retrieve.md)
163-
+ [Build an agent-to-agent retrieval solution](search-agentic-retrieval-how-to-pipeline.md).
164-
165-
+ REST API reference, [Knowledge Agents](/rest/api/searchservice/knowledge-agents?view=rest-searchservice-2025-05-01-preview&preserve-view=true) and [Knowledge Retrieval](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-05-01-preview&preserve-view=true).
166-
167-
+ [Azure OpenAI Demo](https://github.com/Azure-Samples/azure-search-openai-demo), updated to use agentic retrieval.
168-
169220
<!--
170221
•Query Pipeline Recap: The query pipeline includes stages: Query Preprocessing (Query Rewriting, Vectorization, Text analysis), Ranking (Vector Search, Keyword Search, Fusion, Semantic Ranking), and Synthesis (Results for LLM, Extractive Answers, Contextualized Captions).
171222

0 commit comments

Comments
 (0)