Skip to content

Commit 6f11c53

Browse files
committed
updated support level for evaluators
1 parent 6dd291b commit 6f11c53

File tree

3 files changed

+107
-11
lines changed

3 files changed

+107
-11
lines changed

articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md

Lines changed: 84 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,15 +112,97 @@ If you're building agents outside of Azure AI Agent Service, this evaluator acce
112112
- the correctness of parameters used in tool calls;
113113
- the counts of missing or excessive calls.
114114

115-
> [!NOTE]
116-
> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.
115+
#### Tool call evaluation support
116+
`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
117+
1. File Search
118+
2. Azure AI Search
119+
3. Bing Grounding
120+
4. Bing Custom Search
121+
5. SharePoint Grounding
122+
6. Code Interpreter
123+
7. Fabric Data Agent
124+
8. OpenAPI
125+
9. Function Tool (user-defined tools)
126+
However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
117127

118128
### Tool call accuracy example
119129

120130
```python
121131
from azure.ai.evaluation import ToolCallAccuracyEvaluator
122132

123133
tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config, threshold=3)
134+
135+
# provide the agent response with tool calls
136+
tool_call_accuracy(
137+
query="What timezone corresponds to 41.8781,-87.6298?",
138+
response=[
139+
{
140+
"createdAt": "2025-04-25T23:55:52Z",
141+
"run_id": "run_DmnhUGqYd1vCBolcjjODVitB",
142+
"role": "assistant",
143+
"content": [
144+
{
145+
"type": "tool_call",
146+
"tool_call_id": "call_qi2ug31JqzDuLy7zF5uiMbGU",
147+
"name": "azure_maps_timezone",
148+
"arguments": {
149+
"lat": 41.878100000000003,
150+
"lon": -87.629800000000003
151+
}
152+
}
153+
]
154+
},
155+
{
156+
"createdAt": "2025-04-25T23:55:54Z",
157+
"run_id": "run_DmnhUGqYd1vCBolcjjODVitB",
158+
"tool_call_id": "call_qi2ug31JqzDuLy7zF5uiMbGU",
159+
"role": "tool",
160+
"content": [
161+
{
162+
"type": "tool_result",
163+
"tool_result": {
164+
"ianaId": "America/Chicago",
165+
"utcOffset": None,
166+
"abbreviation": None,
167+
"isDaylightSavingTime": None
168+
}
169+
}
170+
]
171+
},
172+
{
173+
"createdAt": "2025-04-25T23:55:55Z",
174+
"run_id": "run_DmnhUGqYd1vCBolcjjODVitB",
175+
"role": "assistant",
176+
"content": [
177+
{
178+
"type": "text",
179+
"text": "The timezone for the coordinates 41.8781, -87.6298 is America/Chicago."
180+
}
181+
]
182+
}
183+
],
184+
tool_definitions=[
185+
{
186+
"name": "azure_maps_timezone",
187+
"description": "local time zone information for a given latitude and longitude.",
188+
"parameters": {
189+
"type": "object",
190+
"properties": {
191+
"lat": {
192+
"type": "float",
193+
"description": "The latitude of the location."
194+
},
195+
"lon": {
196+
"type": "float",
197+
"description": "The longitude of the location."
198+
}
199+
}
200+
}
201+
}
202+
]
203+
)
204+
205+
# alternatively, provide the tool calls directly without the full agent response
124206
tool_call_accuracy(
125207
query="How is the weather in Seattle?",
126208
tool_calls=[{

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,29 @@ pip install azure-ai-evaluation
3939

4040
## Evaluate Azure AI agents
4141

42-
If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents via our converter support for Azure AI agent threads and runs. We support this list of evaluators for Azure AI agent messages from our converter:
42+
If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents via our converter support for Azure AI agents and Semantic Kernel's Chat Completion and Azure AI agents. This list of evaluators accept agent messages returnd by our converter:
4343

44-
### Evaluators supported for evaluation data converter
44+
- Agent: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, `Groundedness`
4545

46-
- Quality: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, `Coherence`, `Fluency`
46+
If you are building other agents with a different schema, you can convert them into the general openai-style [agent message schema](#agent-message-schema) and use the above evaluators.
47+
48+
More generally, if you can parse the agent messages into the [required data formats](./evaluate-sdk.md#data-requirements-for-built-in-evaluators), you can also use the following evaluators:
49+
- Quality: `Coherence`, `Fluency`, `ResponseCompleteness`, `GroundednessPro`, `Retrieval`
4750
- Safety: `CodeVulnerabilities`, `Violence`, `Self-harm`, `Sexual`, `HateUnfairness`, `IndirectAttack`, `ProtectedMaterials`.
4851

49-
> [!NOTE]
50-
> `ToolCallAccuracyEvaluator` only supports Foundry Agent's Function Tool evaluation (user-defined Python functions), but doesn't support other Tool evaluation. If an agent run invoked a tool other than Function Tool, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported.
52+
53+
#### Tool call evaluation support
54+
`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
55+
1. File Search
56+
2. Azure AI Search
57+
3. Bing Grounding
58+
4. Bing Custom Search
59+
5. SharePoint Grounding
60+
6. Code Interpreter
61+
7. Fabric Data Agent
62+
8. OpenAPI
63+
9. Function Tool (user-defined tools)
64+
However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
5165

5266
Here's an example that shows you how to seamlessly build and evaluate an Azure AI agent. Separately from evaluation, Azure AI Foundry Agent Service requires `pip install azure-ai-projects azure-identity`, an Azure AI project connection string, and the supported models.
5367

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,14 +57,14 @@ Built-in evaluators can accept query and response pairs, a list of conversations
5757
| `IntentResolutionEvaluator` | | | | ||
5858
| `ToolCallAccuracyEvaluator` | | | | ||
5959
| `TaskAdherenceEvaluator` | | | | ||
60-
| `GroundednessEvaluator` || | | | |
60+
| `GroundednessEvaluator` || | | | |
6161
| `GroundednessProEvaluator` || | | | |
6262
| `RetrievalEvaluator` || | | | |
6363
| `DocumentRetrievalEvaluator` || | || |
6464
| `RelevanceEvaluator` || | | ||
65-
| `CoherenceEvaluator` || | | | |
66-
| `FluencyEvaluator` || | | | |
67-
| `ResponseCompletenessEvaluator` | | ||| |
65+
| `CoherenceEvaluator` || | | | |
66+
| `FluencyEvaluator` || | | | |
67+
| `ResponseCompletenessEvaluator` | | ||| |
6868
| `QAEvaluator` | | ||| |
6969
| **Natural Language Processing (NLP) Evaluators** |
7070
| `SimilarityEvaluator` | | ||| |

0 commit comments

Comments
 (0)