Skip to content

Commit d92cf26

Browse files
Merge pull request #7671 from HeidiSteen/heidist-rag
[azure search] Updates for RAG differentiation
2 parents c1dc529 + 201cba7 commit d92cf26

15 files changed

+144
-89
lines changed

articles/search/agentic-retrieval-how-to-create-pipeline.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,30 @@
11
---
2-
title: Build an agentic retrieval solution
2+
title: 'Tutorial: Build an agentic retrieval solution'
33
titleSuffix: Azure AI Search
44
description: Learn how to design and build a custom agentic retrieval solution where Azure AI Search handles data retrieval for your custom agents in AI Foundry.
55
author: HeidiSteen
66
ms.author: heidist
77
manager: nitinme
88
ms.date: 09/10/2025
99
ms.service: azure-ai-search
10-
ms.topic: how-to
10+
ms.topic: tutorial
1111
ms.custom:
1212
- build-2025
1313
---
1414

15-
# Build an agent-to-agent retrieval solution using Azure AI Search
15+
# Tutorial: Build an agent-to-agent retrieval solution using Azure AI Search
1616

1717
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1818

1919
This article describes an approach or pattern for building a solution that uses Azure AI Search for knowledge retrieval, and how to integrate knowledge retrieval into a custom solution that includes Azure AI Agent. This pattern uses an agent tool to invoke an agentic retrieval pipeline in Azure AI Search.
2020

2121
:::image type="content" source="media/agentic-retrieval/agent-to-agent-pipeline.svg" alt-text="Diagram of Azure AI Search integration with Azure AI Agent service." lightbox="media/agentic-retrieval/agent-to-agent-pipeline.png" :::
2222

23-
This article supports the [agentic-retrieval-pipeline-example](https://github.com/Azure-Samples/azure-search-python-samples/tree/main/agentic-retrieval-pipeline-example) Python sample on GitHub.
24-
2523
This exercise differs from the [Agentic Retrieval Quickstart](search-get-started-agentic-retrieval.md) in how it uses Azure AI Agent to retrieve data from the index, and how it uses an agent tool for orchestration. If you want to understand the retrieval pipeline in its simplest form, begin with the quickstart.
2624

25+
> [!TIP]
26+
> To run the code for this tutorial, download the [agentic-retrieval-pipeline-example](https://github.com/Azure-Samples/azure-search-python-samples/tree/main/agentic-retrieval-pipeline-example) Python sample on GitHub.
27+
2728
## Prerequisites
2829

2930
The following resources are required for this design pattern:
@@ -315,6 +316,18 @@ Look at output tokens in the [activity array](agentic-retrieval-how-to-retrieve.
315316

316317
+ Set `maxOutputSize` in the [knowledge agent](agentic-retrieval-how-to-create-knowledge-base.md) to govern the size of the response, or `maxRuntimeInSeconds` for time-bound processing.
317318

319+
## Clean up resources
320+
321+
When you're working in your own subscription, at the end of a project, it's a good idea to remove the resources that you no longer need. Resources left running can cost you money. You can delete resources individually or delete the resource group to delete the entire set of resources.
322+
323+
You can also delete individual objects:
324+
325+
+ [Delete a knowledge agent](agentic-retrieval-how-to-create-knowledge-base.md#delete-an-agent)
326+
327+
+ [Delete a knowledge source](agentic-knowledge-source-how-to-search-index.md#delete-a-knowledge-source)
328+
329+
+ [Delete an index](search-how-to-manage-index.md#delete-an-index)
330+
318331
## Related content
319332

320333
+ [Agentic retrieval in Azure AI Search](agentic-retrieval-overview.md)

articles/search/agentic-retrieval-how-to-retrieve.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This article also explains the three components of the retrieval response:
2727
The retrieve request can include instructions for query processing that override the defaults set on the knowledge agent.
2828

2929
> [!NOTE]
30-
> By default, there's no model-generated "answer" in the response and you should pass the extracted response to an LLM so that it can ground its answer based on the search results. For an end-to-end example that includes this step, see [Build an agent-to-agent retrieval solution ](agentic-retrieval-how-to-create-pipeline.md) or [Azure OpenAI Demo](https://github.com/Azure-Samples/azure-search-openai-demo).
30+
> By default, there's no model-generated "answer" in the response and you should pass the extracted response to an LLM so that it can ground its answer based on the search results. For an end-to-end example that includes this step, see [Tutorial: Build an agent-to-agent retrieval solution ](agentic-retrieval-how-to-create-pipeline.md) or [Azure OpenAI Demo](https://github.com/Azure-Samples/azure-search-openai-demo).
3131
>
3232
>Alternatively, you can use [answer synthesis](agentic-retrieval-how-to-answer-synthesis.md) to bring answer formulation into the agentic pipeline. In this workflow, the retriever response consists of LLM-formulated answers instead of the raw search results.
3333

articles/search/agentic-retrieval-overview.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn about agentic retrieval concepts, architecture, and use cases
55
author: HeidiSteen
66
ms.author: heidist
77
manager: nitinme
8-
ms.date: 09/02/2025
8+
ms.date: 10/14/2025
99
ms.service: azure-ai-search
1010
ms.topic: concept-article
1111
ms.custom:
@@ -17,15 +17,19 @@ ms.custom:
1717

1818
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1919

20-
In Azure AI Search, *agentic retrieval* is a new multi-query pipeline designed for complex questions posed by users or agents in chat and copilot apps. It's intended for [Retrieval Augmented Generation (RAG)](retrieval-augmented-generation-overview.md) patterns. Here's how it works:
20+
What is agentic retrieval? In Azure AI Search, *agentic retrieval* is a new multi-query pipeline designed for complex questions posed by users or agents in chat and copilot apps. It's intended for [Retrieval Augmented Generation (RAG)](retrieval-augmented-generation-overview.md) patterns and agent-to-agent workflows.
21+
22+
Here's what it does:
2123

2224
+ Uses a large language model (LLM) to break down a complex query into smaller, focused subqueries for better coverage over your indexed content. Subqueries can include chat history for extra context.
2325

2426
+ Runs subqueries in parallel. Each subquery is semantically reranked to promote the most relevant matches.
2527

2628
+ Combines the best results into a unified response that an LLM can use to generate answers with your proprietary content.
2729

28-
This high-performance pipeline helps you generate high quality grounding data for your chat application, with the ability to answer complex questions quickly.
30+
+ The response is modular yet comprehensive in how it also includes a query plan and source documents. You can choose to use just the search results as grounding data, or invoke the LLM to formulate an answer.
31+
32+
This high-performance pipeline helps you generate high quality grounding data (or an answer) for your chat application, with the ability to answer complex questions quickly.
2933

3034
Programmatically, agentic retrieval is supported through a new [Knowledge Agents object](/rest/api/searchservice/knowledge-agents?view=rest-searchservice-2025-08-01-preview&preserve-view=true) in the 2025-08-01-preview and 2025-05-01-preview data plane REST APIs and in Azure SDK preview packages that provide the feature. A knowledge agent's retrieval response is designed for downstream consumption by other agents and chat apps.
3135

@@ -63,29 +67,31 @@ Agentic retrieval is designed for conversational search experiences that use an
6367

6468
### How it works
6569

66-
The agentic retrieval process follows three main phases:
70+
The agentic retrieval process works as follows:
71+
72+
1. **Workflow initiation**: Your application calls a knowledge agent with retrieve action that provides a query and conversation history.
6773

68-
1. **Query planning**: A knowledge agent sends your query and conversation history to an LLM (gpt-4o, gpt-4.1, and gpt-5 series), which analyzes the context and breaks down complex questions into focused subqueries. This step is automated and not customizable. The number of subqueries depends on what the LLM decides and whether the `maxDocsForReranker` parameter is higher than 50. A new subquery is defined for each 50-document batch sent to semantic ranker.
74+
1. **Query planning**: A knowledge agent sends your query and conversation history to an LLM, which analyzes the context and breaks down complex questions into focused subqueries. This step is automated and not customizable.
6975

70-
2. **Query execution**: All subqueries run simultaneously against your knowledge sources, using keyword, vector, and hybrid search. Each subquery undergoes semantic reranking to find the most relevant matches. References are extracted and retained for citation purposes.
76+
1. **Query execution**: The knowledge agent sends the subqueries to your knowledge sources. All subqueries run simultaneously and can be keyword, vector, and hybrid search. Each subquery undergoes semantic reranking to find the most relevant matches. References are extracted and retained for citation purposes.
7177

72-
3. **Result synthesis**: The system merges and ranks all results, then returns a unified response containing grounding data, source references, and execution metadata.
78+
1. **Result synthesis**: The system combines all results into a unified response with three parts: merged content, source references, and execution details.
7379

74-
Your search index determines query execution and any optimizations that occur during query execution. Specifically, if your index includes searchable text and vector fields, a hybrid query executes. The index semantic configuration, plus optional scoring profiles, synonym maps, analyzers, and normalizers (if you add filters) are all used during query execution. You must have named defaults for a semantic configuration and a scoring profile.
80+
Your search index determines query execution and any optimizations that occur during query execution. Specifically, if your index includes searchable text and vector fields, a hybrid query executes. If the only searchable field is a vector field, then only pure vector search is used. The index semantic configuration, plus optional scoring profiles, synonym maps, analyzers, and normalizers (if you add filters) are all used during query execution. You must have named defaults for a semantic configuration and a scoring profile.
7581

7682
### Required components
7783

7884
| Component | Service | Role |
7985
|-----------|---------|------|
8086
| **LLM** | Azure OpenAI | Creates subqueries from conversation context and later uses grounding data for answer generation |
8187
| **Knowledge agent** | Azure AI Search | Orchestrates the pipeline, connecting to your LLM and managing query parameters |
88+
| **Knowledge source** | Azure AI Search | Wraps the search index with properties pertaining to knowledge agent usage |
8289
| **Search index** | Azure AI Search | Stores your searchable content (text and vectors) with semantic configuration |
8390
| **Semantic ranker** | Azure AI Search | Required component that reranks results for relevance (L2 reranking) |
84-
| **Knowledge source** | Azure AI Search | Wraps the search index with properties pertaining to knowledge agent usage |
8591

8692
### Integration requirements
8793

88-
Your application drives the pipeline by calling the knowledge agent and handling the response. The pipeline returns grounding data that you pass to an LLM for answer generation in your conversation interface. For implementation details, see [Build an agent-to-agent retrieval solution](agentic-retrieval-how-to-create-pipeline.md).
94+
Your application drives the pipeline by calling the knowledge agent and handling the response. The pipeline returns grounding data that you pass to an LLM for answer generation in your conversation interface. For implementation details, see [Tutorial: Build an agent-to-agent retrieval solution](agentic-retrieval-how-to-create-pipeline.md).
8995

9096
> [!NOTE]
9197
> Only gpt-4o, gpt-4.1, and gpt-5 series models are supported for query planning. You can use any model for final answer generation.
@@ -109,8 +115,9 @@ Choose any of these options for your next step.
109115

110116
+ [Create a search index knowledge source](agentic-knowledge-source-how-to-search-index.md) or a [blob knowledge source](agentic-knowledge-source-how-to-blob.md)
111117
+ [Create a knowledge agent](agentic-retrieval-how-to-create-knowledge-base.md)
118+
+ [Use answer synthesis for citation-backed responses](agentic-retrieval-how-to-answer-synthesis.md)
112119
+ [Use a knowledge agent to retrieve data](agentic-retrieval-how-to-retrieve.md)
113-
+ [Build an agent-to-agent retrieval solution](agentic-retrieval-how-to-create-pipeline.md)
120+
+ [Tutorial: Build an agent-to-agent retrieval solution](agentic-retrieval-how-to-create-pipeline.md)
114121

115122
+ REST API reference:
116123

@@ -146,7 +153,7 @@ Semantic ranking is performed for every subquery in the plan. Semantic ranking c
146153

147154
Agentic retrieval has two billing models: billing from Azure OpenAI (query planning and, if enabled, answer synthesis) and billing from Azure AI Search for semantic ranking (query execution).
148155

149-
This example omits answer synthesis and uses hypothetical prices to illustrate the estimation process. Your costs could be lower. For the actual price of transactions, see [Azure OpenAI pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing). For query execution, there's no charge for semantic ranking for agentic retrieval in the initial public preview.
156+
This pricing example omits answer synthesis, but helps illustrate the estimation process. Your costs could be lower. For the actual price of transactions, see [Azure OpenAI pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing). For query execution, there's no charge for semantic ranking for agentic retrieval in the initial public preview.
150157

151158
#### Estimated billing costs for query planning
152159

0 commit comments

Comments
 (0)