Skip to content

Commit ba19f31

Browse files
committed
minor updates
1 parent 0da5f32 commit ba19f31

File tree

1 file changed

+4
-7
lines changed

1 file changed

+4
-7
lines changed

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ pip install azure-ai-evaluation
4646

4747
Agents typically emit messages to interact with a user or other agents. Our built-in evaluators can accept simple data types such as strings in `query`, `response`, `ground_truth` according to the [single-turn data input requirements](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). However, to extract these simple data from agent messages can be a challenge, due to the complex interaction patterns of agents and framework differences. For example, as mentioned, a single user query can trigger a long list of agent messages, typically with multiple tool calls invoked.
4848

49-
As illustrated in the example, we enabled agent message support specifically for these built-in evaluators to evaluate these aspects of agentic workflow. These evaluators take `tool_calls` or `tool_definitions` as parameters (unique to agents.
49+
As illustrated in the example, we enabled agent message support specifically for these built-in evaluators to evaluate these aspects of agentic workflow. These evaluators take `tool_calls` or `tool_definitions` as parameters unique to agents.
5050

5151
| Evaluator | `query` | `response` | `tool_calls` | `tool_definitions` |
5252
|----------------|---------------|---------------|---------------|---------------|
@@ -62,7 +62,7 @@ For `ToolCallAccuracyEvaluator`, either `response` or `tool_calls` must be prov
6262

6363
We will demonstrate some examples of the two data formats: simple agent data, and agent messages. However, due to the unique requirements of these evaluators, we recommend referring to the [sample notebooks](#sample-notebooks) which illustrate the possible input paths for each evaluator.
6464

65-
As with other [built-in AI-assisted quality evaluators](./evaluate-sdk.md#performance-and-quality-evaluators), `IntentResolutionEvaluator` and `TaskAdherenceEvaluator` output a likert score (integer 1-5) where the higher score is better the result. `ToolCallAccuracyEvaluator` output the passing rate of all tool calls made (a float between 0-1) based on user query. To further improve intelligibility, all evaluators accept a binary threshold and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
65+
As with other [built-in AI-assisted quality evaluators](./evaluate-sdk.md#performance-and-quality-evaluators), `IntentResolutionEvaluator` and `TaskAdherenceEvaluator` output a likert score (integer 1-5; higher score is better). `ToolCallAccuracyEvaluator` outputs the passing rate of all tool calls made (a float between 0-1) based on user query. To further improve intelligibility, all evaluators accept a binary threshold and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
6666

6767
- `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
6868
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
@@ -244,11 +244,7 @@ print(result)
244244

245245
#### Converter support
246246

247-
Transforming agent messages into the right evaluation data to use our evaluators can be a nontrivial task. If you use [Azure AI Agent Service](../../ai-services/agents/overview.md), however, you can seamlessly evaluate your agents via our converter support for Azure AI agent threads and runs. Here's an example to create an Azure AI agent and some data for evaluation:
248-
249-
```bash
250-
pip install azure-ai-projects azure-identity
251-
```
247+
Transforming agent messages into the right evaluation data to use our evaluators can be a nontrivial task. If you use [Azure AI Agent Service](../../ai-services/agents/overview.md), however, you can seamlessly evaluate your agents via our converter support for Azure AI agent threads and runs. Here's an example to create an Azure AI agent and some data for evaluation. Separately from evaluation, Azure AI Agent Service requires `pip install azure-ai-projects azure-identity` and an Azure AI project connection string and the supported models.
252248

253249
#### Create agent threads and runs
254250
```python
@@ -429,6 +425,7 @@ Now you're ready to try a sample for each of these evaluators:
429425
- [Intent resolution](https://aka.ms/intentresolution-sample)
430426
- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample)
431427
- [Task adherence](https://aka.ms/taskadherence-sample)
428+
- [Response Completeness](https://aka.ms/rescompleteness-sample)
432429
- [End-to-end Azure AI agent evaluation](https://aka.ms/e2e-agent-eval-sample)
433430

434431

0 commit comments

Comments
 (0)