You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-agentic-retrieval-concept.md
+5-45Lines changed: 5 additions & 45 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,58 +45,17 @@ Query expansion and parallel execution, plus the retrieval response, are the key
45
45
Agentic retrieval adds latency to query processing, but it makes up for it by adding these capabilities:
46
46
47
47
+ Reads in chat history as an input to the retrieval pipeline.
48
+
+ Deconstructs a complex query that contains multiple "asks" into component parts. For example: "find me a hotel near the beach, with airport transportation, and that's within walking distance of vegetarian restaurants."
48
49
+ Rewrites an original query into multiple subqueries using synonym maps (optional) and LLM-generated paraphrasing.
49
50
+ Corrects spelling mistakes.
50
-
+ Deconstructs a complex query that contains multiple "asks" into component parts. For example: "find me a hotel near the beach, with airport transportation, and that's within walking distance of vegetarian restaurants."
51
-
+ Executes all subqueries simultaneously.
51
+
+ Executes all subqueries simultaneously.
52
52
+ Outputs a unified result as a single string. Alternatively, you can extract parts of the response for your solution. Metadata about query execution and reference data is included in the response.
53
53
54
-
Agentic retrieval invokes the entire query processing pipeline multiple times for each query request, but it does so in parallel, preserving the efficiency and performance necessary for a reasonable user experience.
54
+
Agentic retrieval invokes the entire query processing pipeline multiple times for each subquery, but it does so in parallel, preserving the efficiency and performance necessary for a reasonable user experience.
55
55
56
56
> [!NOTE]
57
57
> Including an LLM in query planning adds latency to a query pipeline. You can mitigate the effects by using faster models, such as gpt-4o-mini, and summarizing the message threads. Nonetheless, you should expect longer query times with this pipeline.
58
58
59
-
<!-- ## Architecture and components
60
-
61
-
Agentic retrieval is designed for a conversational search experience that includes an LLM. An important part of agentic retrieval is how the LLM breaks down an initial query into subqueries, which are more effective at locating the best matches in your index.
62
-
63
-
:::image type="content" source="media/agentic-retrieval/agentic-retrieval-architecture.png" alt-text="Diagram of agentic retrieval workflow using an example query." lightbox="media/agentic-retrieval/agentic-retrieval-architecture.png" :::
64
-
65
-
The workflow includes:
66
-
67
-
*Query planning* where the search engine calls an LLM (a chat completion model) that you provide. The output is one or more subqueries. This step is mostly internal. You can review the subqueries that are generated, but query planning isn't intended to be customizable or configurable.
68
-
69
-
*Query execution* is a parallel process, with L1 ranking for vector and keyword search, and L2 semantic reranking of the L1 results. In agentic retrieval, semantic ranker is a required component.
70
-
71
-
*Merged results* refers to the output, which is a unified string of all results that you can pass directly to an LLM.
72
-
73
-
Notice that the architecture requires an LLM for query planning. Only supported LLMs can be used for this step. At the end of the pipeline, you can pass the merged results to any model, tool, or agent.
74
-
75
-
### Components
76
-
77
-
Agentic retrieval has these components:
78
-
79
-
| Component | Resource | Usage |
80
-
|-----------|----------|-------|
81
-
| LLM (gpt-4o and gpt-4.1 series) | Azure OpenAI | An LLM has two functions. First, it formulates subqueries for the query plan and sends it back to the knowledge agent. Second, after the query executes, the LLM receives grounding data from the query response and uses it for answer formulation. |
82
-
| Search index | Azure AI Search | Contains plain text and vector content, a semantic configuration, and other elements as needed. |
83
-
| Knowledge agent | Azure AI Search | Connects to your LLM, providing parameters and inputs to build a query plan. |
84
-
| Retrieval engine | Azure AI Search | Executes on the LLM-generated query plan and other parameters, returning a rich response that includes content and query plan metadata. Queries are keyword, vector, and hybrid. Results are merged and ranked. |
85
-
| Semantic ranker | Azure AI Search | Provides L2 reranking, promoting the most relevant matches. Semantic ranker is required for agentic retrieval. |
86
-
87
-
Your solution should include a tool or app that drives the pipeline. An agentic retrieval pipeline concludes with the response object that provides grounding data. Your solution should take it from there, handling the response by passing it to an LLM to generate an answer, which you render inline in the user conversation. For more information about this step, see [Build an agent-to-agent retrieval solution](search-agentic-retrieval-how-to-pipeline.md). -->
88
-
89
-
Agentic retrieval has these processes:
90
-
91
-
+ Requests for agentic retrieval are initiated by calls to a knowledge agent on Azure AI Search.
92
-
+ Knowledge agents connect to an LLM and provide conversation history as input. How much history is configurable by the number of messages you provide.
93
-
+ LLMs look at the conversation and determine whether to break it up into subqueries. The number of subqueries depends on what the LLM decides and whether the `maxDocsForReranker` parameter is higher than 50. A new subquery is defined for each 50-document batch sent to semantic ranker.
94
-
+ Subqueries execute simultaneously on Azure AI Search and generate structured results and extracted references.
95
-
+ Results are ranked and merged.
96
-
+ Knowledge agent responses are formulated and returned as a three-part response consisting of a unified result (a long string), a reference array, and an activities array that enumerates all operations.
97
-
98
-
Your search index determines query execution and any optimizations that occur during query execution. This includes your semantic configuration, as well as optional scoring profiles, synonym maps, analyzers, and normalizers (if you add filters).
99
-
100
59
## Architecture and workflow
101
60
102
61
Agentic retrieval is designed for conversational search experiences that use an LLM to intelligently break down complex queries. The system coordinates multiple Azure services to deliver comprehensive search results.
@@ -113,6 +72,8 @@ The agentic retrieval process follows three main phases:
113
72
114
73
3.**Result synthesis**: The system merges and ranks all results, then returns a unified response containing grounding data, source references, and execution metadata.
115
74
75
+
Your search index determines query execution and any optimizations that occur during query execution. Specifically, if your index includes searchable text and vector fields, a hybrid query executes. The index semantic configuration, plus optional scoring profiles, synonym maps, analyzers, and normalizers (if you add filters) are all used during query execution. You must have named defaults for a semantic configuration and a scoring profile.
76
+
116
77
### Required components
117
78
118
79
| Component | Service | Role |
@@ -135,7 +96,6 @@ You must use the preview REST APIs or a prerelease Azure SDK package that provid
135
96
136
97
Choose any of these options for your next step.
137
98
138
-
<!-- + Watch this demo. -->
139
99
+[Quickstart article: Run agentic retrieval in Azure AI Search](search-get-started-agentic-retrieval.md). Learn the basic workflow using sample data and a prepared index and queries.
0 commit comments