Skip to content

Commit 6cf01cf

Browse files
committed
edits to eval changes
1 parent 92b5003 commit 6cf01cf

File tree

3 files changed

+32
-31
lines changed

3 files changed

+32
-31
lines changed

articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ model_config = AzureOpenAIModelConfiguration(
5252
)
5353
```
5454

55-
### Evaluator model support
55+
### Evaluator models support
5656

5757
We support AzureOpenAI or OpenAI [reasoning models](../../../ai-services/openai/how-to/reasoning.md) and non-reasoning models for the LLM-judge depending on the evaluators:
5858

@@ -65,7 +65,7 @@ For complex evaluation that requires refined reasoning, we recommend a strong re
6565

6666
## Intent resolution
6767

68-
`IntentResolutionEvaluator` measures how well the system identifies and understands a user's request, including how well it scopes the users intent, asks clarifying questions, and reminds end users of its scope of capabilities. Higher score means better identification of user intent.
68+
`IntentResolutionEvaluator` measures how well the system identifies and understands a user's request, including how well it scopes the user's intent, asks clarifying questions, and reminds end users of its scope of capabilities. Higher score means better identification of user intent.
6969

7070
### Intent resolution example
7171

@@ -99,11 +99,9 @@ The numerical score is on a Likert scale (integer 1 to 5) and a higher score is
9999
}
100100
}
101101

102-
103-
104102
```
105103

106-
If you're building agents outside of Azure AI Agent Service, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
104+
If you're building agents outside of Azure AI Foundry Agent Service, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
107105

108106
## Tool call accuracy
109107

@@ -113,17 +111,20 @@ If you're building agents outside of Azure AI Agent Service, this evaluator acce
113111
- the counts of missing or excessive calls.
114112

115113
#### Tool call evaluation support
116-
`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
117-
1. File Search
118-
2. Azure AI Search
119-
3. Bing Grounding
120-
4. Bing Custom Search
121-
5. SharePoint Grounding
122-
6. Code Interpreter
123-
7. Fabric Data Agent
124-
8. OpenAPI
125-
9. Function Tool (user-defined tools)
126-
However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
114+
115+
`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Foundry Agent Service for the following tools:
116+
117+
- File Search
118+
- Azure AI Search
119+
- Bing Grounding
120+
- Bing Custom Search
121+
- SharePoint Grounding
122+
- Code Interpreter
123+
- Fabric Data Agent
124+
- OpenAPI
125+
- Function Tool (user-defined tools)
126+
127+
However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It's recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
127128

128129
### Tool call accuracy example
129130

@@ -270,7 +271,7 @@ If you're building agents outside of Azure AI Agent Service, this evaluator acce
270271

271272
## Task adherence
272273

273-
In various task-oriented AI systems such as agentic systems, it's important to assess whether the agent has stayed on track to complete a given task instead of making inefficient or out-of-scope steps. `TaskAdherenceEvaluator` measures how well an agents response adheres to their assigned tasks, according to their task instruction (extracted from system message and user query), and available tools. Higher score means better adherence of the system instruction to resolve the given task.
274+
In various task-oriented AI systems such as agentic systems, it's important to assess whether the agent has stayed on track to complete a given task instead of making inefficient or out-of-scope steps. `TaskAdherenceEvaluator` measures how well an agent's response adheres to their assigned tasks, according to their task instruction (extracted from system message and user query), and available tools. Higher score means better adherence of the system instruction to resolve the given task.
274275

275276
### Task adherence example
276277

articles/ai-foundry/concepts/evaluation-evaluators/rag-evaluators.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,6 @@ AI systems can fabricate content or generate irrelevant responses outside the gi
245245
```python
246246
from azure.ai.evaluation import GroundednessProEvaluator
247247
from azure.identity import DefaultAzureCredential
248-
249248
import os
250249
from dotenv import load_dotenv
251250
load_dotenv()

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ pip install azure-ai-evaluation
3939

4040
## Evaluate Azure AI agents
4141

42-
If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents via our converter support for Azure AI agents and Semantic Kernel's Chat Completion and Azure AI agents. This list of evaluators are supported for evaluation data returned by the converter: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, `Groundedness`.
42+
If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents using our converter support for Azure AI agents and Semantic Kernel agents. The following evaluators are supported for evaluation data returned by the converter: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, and `Groundedness`.
4343

4444
> [!NOTE]
4545
> If you are building other agents that output a different schema, you can convert them into the general openai-style [agent message schema](#agent-message-schema) and use the above evaluators.
@@ -48,15 +48,17 @@ If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you
4848

4949
#### Tool call evaluation support
5050
`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
51-
1. File Search
52-
2. Azure AI Search
53-
3. Bing Grounding
54-
4. Bing Custom Search
55-
5. SharePoint Grounding
56-
6. Code Interpreter
57-
7. Fabric Data Agent
58-
8. OpenAPI
59-
9. Function Tool (user-defined tools)
51+
52+
- File Search
53+
- Azure AI Search
54+
- Bing Grounding
55+
- Bing Custom Search
56+
- SharePoint Grounding
57+
- Code Interpreter
58+
- Fabric Data Agent
59+
- OpenAPI
60+
- Function Tool (user-defined tools)
61+
6062
However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
6163

6264
Here's an example that shows you how to seamlessly build and evaluate an Azure AI agent. Separately from evaluation, Azure AI Foundry Agent Service requires `pip install azure-ai-projects azure-identity`, an Azure AI project connection string, and the supported models.
@@ -215,10 +217,10 @@ reasoning_model_config = {
215217
"api_version": os.getenv("AZURE_API_VERSION"),
216218
}
217219

218-
# Evaluators you may want to use reasoning models with
220+
# Evaluators you might want to use with reasoning models
219221
quality_evaluators = {evaluator.__name__: evaluator(model_config=reasoning_model_config, is_reasoning_model=True) for evaluator in [IntentResolutionEvaluator, TaskAdherenceEvaluator, ToolCallAccuracyEvaluator]}
220222

221-
# Other evaluators you may NOT want to use reasoning models
223+
# Other evaluators you might NOT want to use with reasoning models
222224
quality_evaluators.update({ evaluator.__name__: evaluator(model_config=model_config) for evaluator in [CoherenceEvaluator, FluencyEvaluator, RelevanceEvaluator]})
223225

224226
## Using Azure AI Foundry (non-Hub) project endpoint, example: AZURE_AI_PROJECT=https://your-account.services.ai.azure.com/api/projects/your-project
@@ -234,7 +236,6 @@ for name, evaluator in quality_and_safety_evaluators.items():
234236
print(name)
235237
print(json.dumps(result, indent=4))
236238

237-
238239
```
239240

240241
#### Output format

0 commit comments

Comments
 (0)