edits to eval changes

s-polly · s-polly · commit 6cf01cfc29d9 · 2025-10-11T13:10:17.000-05:00
diff --git a/articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md b/articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md
@@ -52,7 +52,7 @@ model_config = AzureOpenAIModelConfiguration(
 )
 ```
 
-### Evaluator model support
+### Evaluator models support
 
 We support AzureOpenAI or OpenAI [reasoning models](../../../ai-services/openai/how-to/reasoning.md) and non-reasoning models for the LLM-judge depending on the evaluators:
 
@@ -65,7 +65,7 @@ For complex evaluation that requires refined reasoning, we recommend a strong re
 
 ## Intent resolution
 
-`IntentResolutionEvaluator` measures how well the system identifies and understands a user's request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities. Higher score means better identification of user intent.
+`IntentResolutionEvaluator` measures how well the system identifies and understands a user's request, including how well it scopes the user's intent, asks clarifying questions, and reminds end users of its scope of capabilities. Higher score means better identification of user intent.
 
 ### Intent resolution example
 
@@ -99,11 +99,9 @@ The numerical score is on a Likert scale (integer 1 to 5) and a higher score is
     }
 }
 
-
-
 ```
 
-If you're building agents outside of Azure AI Agent Service, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
+If you're building agents outside of Azure AI Foundry Agent Service, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
 
 ## Tool call accuracy
 
@@ -113,17 +111,20 @@ If you're building agents outside of Azure AI Agent Service, this evaluator acce
 - the counts of missing or excessive calls.
 
 #### Tool call evaluation support
-`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
-1. File Search
-2. Azure AI Search
-3. Bing Grounding
-4. Bing Custom Search
-5. SharePoint Grounding
-6. Code Interpreter
-7. Fabric Data Agent 
-8. OpenAPI   
-9. Function Tool (user-defined tools)
-However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
+
+`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Foundry Agent Service for the following tools:
+
+- File Search
+- Azure AI Search
+- Bing Grounding
+- Bing Custom Search
+- SharePoint Grounding
+- Code Interpreter
+- Fabric Data Agent
+- OpenAPI
+- Function Tool (user-defined tools)
+
+However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It's recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
 
 ### Tool call accuracy example
 
@@ -270,7 +271,7 @@ If you're building agents outside of Azure AI Agent Service, this evaluator acce
 
 ## Task adherence
 
-In various task-oriented AI systems such as agentic systems, it's important to assess whether the agent has stayed on track to complete a given task instead of making inefficient or out-of-scope steps. `TaskAdherenceEvaluator` measures how well an agent’s response adheres to their assigned tasks, according to their task instruction (extracted from system message and user query), and available tools. Higher score means better adherence of the system instruction to resolve the given task.
+In various task-oriented AI systems such as agentic systems, it's important to assess whether the agent has stayed on track to complete a given task instead of making inefficient or out-of-scope steps. `TaskAdherenceEvaluator` measures how well an agent's response adheres to their assigned tasks, according to their task instruction (extracted from system message and user query), and available tools. Higher score means better adherence of the system instruction to resolve the given task.
 
 ### Task adherence example
 
diff --git a/articles/ai-foundry/concepts/evaluation-evaluators/rag-evaluators.md b/articles/ai-foundry/concepts/evaluation-evaluators/rag-evaluators.md
@@ -245,7 +245,6 @@ AI systems can fabricate content or generate irrelevant responses outside the gi
 ```python
 from azure.ai.evaluation import GroundednessProEvaluator
 from azure.identity import DefaultAzureCredential
-
 import os
 from dotenv import load_dotenv
 load_dotenv()
diff --git a/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md b/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md
@@ -39,7 +39,7 @@ pip install azure-ai-evaluation
 
 ## Evaluate Azure AI agents
 
-If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents via our converter support for Azure AI agents and Semantic Kernel's Chat Completion and Azure AI agents. This list of evaluators are supported for evaluation data returned by the converter: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, `Groundedness`.
+If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents using our converter support for Azure AI agents and Semantic Kernel agents. The following evaluators are supported for evaluation data returned by the converter: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, and `Groundedness`.
 
 > [!NOTE]
 > If you are building other agents that output a different schema, you can convert them into the general openai-style [agent message schema](#agent-message-schema) and use the above evaluators.
@@ -48,15 +48,17 @@ If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you
 
 #### Tool call evaluation support
 `ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
-1. File Search
-2. Azure AI Search
-3. Bing Grounding
-4. Bing Custom Search
-5. SharePoint Grounding
-6. Code Interpreter
-7. Fabric Data Agent 
-8. OpenAPI   
-9. Function Tool (user-defined tools)
+
+- File Search
+- Azure AI Search
+- Bing Grounding
+- Bing Custom Search
+- SharePoint Grounding
+- Code Interpreter
+- Fabric Data Agent 
+- OpenAPI   
+- Function Tool (user-defined tools)
+
 However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
 
 Here's an example that shows you how to seamlessly build and evaluate an Azure AI agent. Separately from evaluation, Azure AI Foundry Agent Service requires `pip install azure-ai-projects azure-identity`, an Azure AI project connection string, and the supported models.
@@ -215,10 +217,10 @@ reasoning_model_config = {
     "api_version": os.getenv("AZURE_API_VERSION"),
 }
 
-# Evaluators you may want to use reasoning models with
+# Evaluators you might want to use with reasoning models 
 quality_evaluators = {evaluator.__name__: evaluator(model_config=reasoning_model_config, is_reasoning_model=True) for evaluator in [IntentResolutionEvaluator, TaskAdherenceEvaluator, ToolCallAccuracyEvaluator]}
 
-# Other evaluators you may NOT want to use reasoning models 
+# Other evaluators you might NOT want to use with reasoning models 
 quality_evaluators.update({ evaluator.__name__: evaluator(model_config=model_config) for evaluator in [CoherenceEvaluator, FluencyEvaluator, RelevanceEvaluator]})
 
 ## Using Azure AI Foundry (non-Hub) project endpoint, example: AZURE_AI_PROJECT=https://your-account.services.ai.azure.com/api/projects/your-project
@@ -234,7 +236,6 @@ for name, evaluator in quality_and_safety_evaluators.items():
     print(name)
     print(json.dumps(result, indent=4)) 
 
-
 ```
 
 #### Output format