Skip to content

Commit 668eaf2

Browse files
committed
Add doc to TOC and small edits
1 parent 60e60c8 commit 668eaf2

File tree

3 files changed

+9
-11
lines changed

3 files changed

+9
-11
lines changed

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,22 +17,20 @@ author: lgayhardt
1717

1818
[!INCLUDE [feature-preview](../../includes/feature-preview.md)]
1919

20-
2120
AI Agents are powerful productivity assistants to create workflows for business needs. However, they come with challenges for observability due to their complex interaction patterns. In this article, you learn how to run built-in evaluators locally on simple agent data or agent messages with built-in evaluators to thoroughly assess the performance of your AI agents.
2221

23-
To build production-ready agentic applications and enable observability and transparency, developers need tools to assess not just the final output from an agent's workflows, but the quality and efficiency of the workflows themselves. For example, consider a typical agentic workflow:
22+
To build production-ready agentic applications and enable observability and transparency, developers need tools to assess not just the final output from an agent's workflows, but the quality and efficiency of the workflows themselves. For example, consider a typical agentic workflow:
23+
24+
:::image type="content" source="../../media/evaluations/agent-workflow-eval.gif" alt-text="Animation of the agent's workflow from user query to intent resolution to tool calls to final response." lightbox="../../media/evaluations/agent-workflow-eval.gif":::
2425

25-
:::image type="content" source="../../media/evaluations/agent-workflow-eval.gif" alt-text="Animation of the agent's workflow from user query to intent resolution to tool calls to final response." " lightbox="../../media/evaluations/agent-workflow-eval.gif":::
26+
The agentic workflow is triggered by a user query "weather tomorrow". It starts to execute multiple steps, such as reasoning through user intents, tool calling, and utilizing retrieval-augmented generation to produce a final response. In this process, evaluating each steps of the workflow—along with the quality and safety of the final output—is crucial. Specifically, we formulate these evaluation aspects into the following evaluators for agents:
2627

27-
The agentic workflow is triggered by a user query "weather tomorrow". It starts to execute multiple steps, such as reasoning through user intents, tool calling, and utilizing retrieval-augmented generation to produce a final response. In this process, evaluating each steps of the workflow—along with the quality and safety of the final output—is crucial. Specifically, we formulate these evaluation aspects into the following evaluators for agents:
2828
- [Intent resolution](https://aka.ms/intentresolution-sample): Measures how well the agent identifies the user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities.
2929
- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample): Evaluates the agent’s ability to select the appropriate tools, and process correct parameters from previous steps.
3030
- [Task adherence](https://aka.ms/taskadherence-sample): Measures how well the agent’s final response adheres to its assigned tasks, according to its system message and prior steps.
3131

3232
To see more quality and risk and safety evaluators, refer to [built-in evaluators](./evaluate-sdk.md#data-requirements-for-built-in-evaluators) to assess the content in the process where appropriate.
3333

34-
35-
3634
## Getting started
3735

3836
First install the evaluators package from Azure AI evaluation SDK:
@@ -41,10 +39,9 @@ First install the evaluators package from Azure AI evaluation SDK:
4139
pip install azure-ai-evaluation
4240
```
4341

44-
4542
### Evaluators with agent message support
4643

47-
Agents typically emit messages to interact with a user or other agents. Our built-in evaluators can accept simple data types such as strings in `query`, `response`, `ground_truth` according to the [single-turn data input requirements](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). However, to extract these simple data from agent messages can be a challenge, due to the complex interaction patterns of agents and framework differences. For example, as mentioned, a single user query can trigger a long list of agent messages, typically with multiple tool calls invoked.
44+
Agents typically emit messages to interact with a user or other agents. Our built-in evaluators can accept simple data types such as strings in `query`, `response`, `ground_truth` according to the [single-turn data input requirements](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). However, to extract these simple data from agent messages can be a challenge, due to the complex interaction patterns of agents and framework differences. For example, as mentioned, a single user query can trigger a long list of agent messages, typically with multiple tool calls invoked.
4845

4946
As illustrated in the example, we enabled agent message support specifically for these built-in evaluators to evaluate these aspects of agentic workflow. These evaluators take `tool_calls` or `tool_definitions` as parameters unique to agents.
5047

@@ -67,7 +64,6 @@ As with other [built-in AI-assisted quality evaluators](./evaluate-sdk.md#perfor
6764
- `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
6865
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
6966

70-
7167
#### Simple agent data
7268

7369
In simple agent data format, `query` and `response` are simple python strings. For example:
@@ -241,12 +237,12 @@ print(result)
241237

242238
```
243239

244-
245240
#### Converter support
246241

247-
Transforming agent messages into the right evaluation data to use our evaluators can be a nontrivial task. If you use [Azure AI Agent Service](../../ai-services/agents/overview.md), however, you can seamlessly evaluate your agents via our converter support for Azure AI agent threads and runs. Here's an example to create an Azure AI agent and some data for evaluation. Separately from evaluation, Azure AI Agent Service requires `pip install azure-ai-projects azure-identity` and an Azure AI project connection string and the supported models.
242+
Transforming agent messages into the right evaluation data to use our evaluators can be a nontrivial task. If you use [Azure AI Agent Service](../../../ai-services/agents/overview.md), however, you can seamlessly evaluate your agents via our converter support for Azure AI agent threads and runs. Here's an example to create an Azure AI agent and some data for evaluation. Separately from evaluation, Azure AI Agent Service requires `pip install azure-ai-projects azure-identity` and an Azure AI project connection string and the supported models.
248243

249244
#### Create agent threads and runs
245+
250246
```python
251247
import os, json
252248
import pandas as pd
-145 KB
Binary file not shown.

articles/ai-foundry/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,8 @@ items:
379379
href: how-to/evaluate-prompts-playground.md
380380
- name: Generate synthetic and simulated data for evaluation
381381
href: how-to/develop/simulator-interaction-data.md
382+
- name: Evaluate agents locally with Azure AI Evaluation SDK
383+
href: how-to/develop/agent-evaluate-sdk.md
382384
- name: Local evaluation with Azure AI Evaluation SDK
383385
href: how-to/develop/evaluate-sdk.md
384386
displayName: code,accuracy,metrics

0 commit comments

Comments
 (0)