updated support level for evaluators

changliu2 · changliu2 · commit 6f11c5326f24 · 2025-10-10T19:45:32.000-04:00
diff --git a/articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md b/articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md
@@ -112,15 +112,97 @@ If you're building agents outside of Azure AI Agent Service, this evaluator acce
 - the correctness of parameters used in tool calls;
 - the counts of missing or excessive calls.
 
-> [!NOTE]
-> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.
+#### Tool call evaluation support
+`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
+1. File Search
+2. Azure AI Search
+3. Bing Grounding
+4. Bing Custom Search
+5. SharePoint Grounding
+6. Code Interpreter
+7. Fabric Data Agent 
+8. OpenAPI   
+9. Function Tool (user-defined tools)
+However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
 
 ### Tool call accuracy example
 
 ```python
 from azure.ai.evaluation import ToolCallAccuracyEvaluator
 
 tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config, threshold=3)
+
+# provide the agent response with tool calls 
+tool_call_accuracy(
+    query="What timezone corresponds to 41.8781,-87.6298?",
+    response=[
+    {
+        "createdAt": "2025-04-25T23:55:52Z",
+        "run_id": "run_DmnhUGqYd1vCBolcjjODVitB",
+        "role": "assistant",
+        "content": [
+            {
+                "type": "tool_call",
+                "tool_call_id": "call_qi2ug31JqzDuLy7zF5uiMbGU",
+                "name": "azure_maps_timezone",
+                "arguments": {
+                    "lat": 41.878100000000003,
+                    "lon": -87.629800000000003
+                }
+            }
+        ]
+    },    
+    {
+        "createdAt": "2025-04-25T23:55:54Z",
+        "run_id": "run_DmnhUGqYd1vCBolcjjODVitB",
+        "tool_call_id": "call_qi2ug31JqzDuLy7zF5uiMbGU",
+        "role": "tool",
+        "content": [
+            {
+                "type": "tool_result",
+                "tool_result": {
+                    "ianaId": "America/Chicago",
+                    "utcOffset": None,
+                    "abbreviation": None,
+                    "isDaylightSavingTime": None
+                }
+            }
+        ]
+    },
+    {
+        "createdAt": "2025-04-25T23:55:55Z",
+        "run_id": "run_DmnhUGqYd1vCBolcjjODVitB",
+        "role": "assistant",
+        "content": [
+            {
+                "type": "text",
+                "text": "The timezone for the coordinates 41.8781, -87.6298 is America/Chicago."
+            }
+        ]
+    }
+    ],   
+    tool_definitions=[
+                {
+                    "name": "azure_maps_timezone",
+                    "description": "local time zone information for a given latitude and longitude.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "lat": {
+                                "type": "float",
+                                "description": "The latitude of the location."
+                            },
+                            "lon": {
+                                "type": "float",
+                                "description": "The longitude of the location."
+                            }
+                        }
+                    }
+                }
+    ]
+)
+
+# alternatively, provide the tool calls directly without the full agent response
 tool_call_accuracy(
     query="How is the weather in Seattle?",
     tool_calls=[{
diff --git a/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md b/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md
@@ -39,15 +39,29 @@ pip install azure-ai-evaluation
 
 ## Evaluate Azure AI agents
 
-If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents via our converter support for Azure AI agent threads and runs. We support this list of evaluators for Azure AI agent messages from our converter:
+If you use [Foundry Agent Service](../../../ai-services/agents/overview.md), you can seamlessly evaluate your agents via our converter support for Azure AI agents and Semantic Kernel's Chat Completion and Azure AI agents. This list of evaluators accept agent messages returnd by our converter:
 
-### Evaluators supported for evaluation data converter
+- Agent: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, `Groundedness`
 
-- Quality: `IntentResolution`, `ToolCallAccuracy`, `TaskAdherence`, `Relevance`, `Coherence`, `Fluency`
+If you are building other agents with a different schema, you can convert them into the general openai-style [agent message schema](#agent-message-schema) and use the above evaluators.
+
+More generally, if you can parse the agent messages into the [required data formats](./evaluate-sdk.md#data-requirements-for-built-in-evaluators), you can also use the following evaluators:
+- Quality: `Coherence`, `Fluency`, `ResponseCompleteness`, `GroundednessPro`, `Retrieval`
 - Safety: `CodeVulnerabilities`, `Violence`, `Self-harm`, `Sexual`, `HateUnfairness`, `IndirectAttack`, `ProtectedMaterials`.
 
-> [!NOTE]
-> `ToolCallAccuracyEvaluator` only supports Foundry Agent's Function Tool evaluation (user-defined Python functions), but doesn't support other Tool evaluation. If an agent run invoked a tool other than Function Tool, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported.
+
+#### Tool call evaluation support
+`ToolCallAccuracyEvaluator` supports evaluation in Azure AI Agent for the following tools:
+1. File Search
+2. Azure AI Search
+3. Bing Grounding
+4. Bing Custom Search
+5. SharePoint Grounding
+6. Code Interpreter
+7. Fabric Data Agent 
+8. OpenAPI   
+9. Function Tool (user-defined tools)
+However, if a non-supported tool is used in the agent run, it outputs a "pass" and a reason that evaluating the invoked tool(s) isn't supported, for ease of filtering out these cases. It is recommended that you wrap non-supported tools as user-defined tools to enable evaluation.
 
 Here's an example that shows you how to seamlessly build and evaluate an Azure AI agent. Separately from evaluation, Azure AI Foundry Agent Service requires `pip install azure-ai-projects azure-identity`, an Azure AI project connection string, and the supported models.
 
diff --git a/articles/ai-foundry/how-to/develop/evaluate-sdk.md b/articles/ai-foundry/how-to/develop/evaluate-sdk.md
@@ -57,14 +57,14 @@ Built-in evaluators can accept query and response pairs, a list of conversations
 | `IntentResolutionEvaluator` | | | | | ✓ |
 | `ToolCallAccuracyEvaluator` | | | | | ✓ |
 | `TaskAdherenceEvaluator` | | | | | ✓ |
-| `GroundednessEvaluator` | ✓ | | | | |
+| `GroundednessEvaluator` | ✓ | | | | ✓ |
 | `GroundednessProEvaluator` | ✓ | | | | |
 | `RetrievalEvaluator` | ✓ | | | | |
 | `DocumentRetrievalEvaluator` | ✓ | | | ✓ | |
 | `RelevanceEvaluator` | ✓ | | | | ✓ |
-| `CoherenceEvaluator` | ✓ | | | | ✓ |
-| `FluencyEvaluator` | ✓ | | | | ✓ |
-| `ResponseCompletenessEvaluator` | ✓ | | ✓ | ✓ | |
+| `CoherenceEvaluator` | ✓ | | | |  |
+| `FluencyEvaluator` | ✓ | | | |  |
+| `ResponseCompletenessEvaluator` | | | ✓ | ✓ | |
 | `QAEvaluator` | | | ✓ | ✓ | |
 | **Natural Language Processing (NLP) Evaluators** |
 | `SimilarityEvaluator` | | | ✓ | ✓ | |