agent and eval sdk updates

lgayhardt · lgayhardt · commit 4bce547cbad2 · 2025-07-22T12:22:21.000-07:00
diff --git a/articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md b/articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md
@@ -47,7 +47,7 @@ load_dotenv()
 
 model_config = AzureOpenAIModelConfiguration(
     azure_endpoint=os.environ["AZURE_ENDPOINT"],
-    api_key=os.environ.get["AZURE_API_KEY"],
+    api_key=os.environ.get("AZURE_API_KEY"),
     azure_deployment=os.environ.get("AZURE_DEPLOYMENT_NAME"),
     api_version=os.environ.get("AZURE_API_VERSION"),
 )
@@ -104,7 +104,7 @@ The numerical score is on a Likert scale (integer 1 to 5) and a higher score is
 
 ```
 
-If you're building agents outside of Azure AI Agent Serice, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
+If you're building agents outside of Azure AI Agent Service, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
 
 ## Tool call accuracy
 
@@ -114,7 +114,7 @@ If you're building agents outside of Azure AI Agent Serice, this evaluator accep
 - the counts of missing or excessive calls.
 
 > [!NOTE]
-> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.   
+> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.
 
 ### Tool call accuracy example
 
diff --git a/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md b/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md
@@ -59,6 +59,7 @@ Agents can use tool. Here's an example of creating custom tools you intend the a
 ```python
 from azure.ai.projects.models import FunctionTool, ToolSet
 from typing import Set, Callable, Any
+import json
 
 # Define a custom Python function.
 def fetch_weather(location: str) -> str:
@@ -177,7 +178,7 @@ And that's it! `converted_data` contains all inputs required for [these evaluato
 
 For complex tasks that require refined reasoning for the evaluation, we recommend a strong reasoning model like `o3-mini` or the o-series mini models released afterwards with a balance of reasoning performance and cost efficiency.
 
-We set up a list of quality and safety evaluator in `quality_evaluators` and `safety_evaluators` and reference them in [evaluating multiples agent runs or a thread](#evaluate-multiple-agent-runs-or-threads).
+We set up a list of quality and safety evaluators in `quality_evaluators` and `safety_evaluators` and reference them in [evaluating multiples agent runs or a thread](#evaluate-multiple-agent-runs-or-threads).
 
 ```python
 # This is specific to agentic workflows.
@@ -213,7 +214,7 @@ quality_evaluators.update({ evaluator.__name__: evaluator(model_config=model_con
 ## Using Azure AI Foundry (non-Hub) project endpoint, example: AZURE_AI_PROJECT=https://your-account.services.ai.azure.com/api/projects/your-project
 azure_ai_project = os.environ.get("AZURE_AI_PROJECT")
 
-safety_evaluators = {evaluator.__name__: evaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential()) for evaluator in[ContentSafetyEvaluator, IndirectAttackEvaluator, CodeVulnerabilityEvaluator]}
+safety_evaluators = {evaluator.__name__: evaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential()) for evaluator in [ContentSafetyEvaluator, IndirectAttackEvaluator, CodeVulnerabilityEvaluator]}
 
 # Reference the quality and safety evaluator list above.
 quality_and_safety_evaluators = {**quality_evaluators, **safety_evaluators}
@@ -377,7 +378,9 @@ See the following output (reference [Output format](#output-format) for details)
     "intent_resolution_reason": "The response provides the opening hours of the Eiffel Tower, which directly addresses the user's query. The information is clear, accurate, and complete, fully resolving the user's intent.",
 }
 ```
+
 ### Agent tool calls and definitions
+
 See the following examples of `tool_calls` and `tool_definitions` for `ToolCallAccuracyEvaluator`:
 
 ```python
@@ -438,6 +441,7 @@ See the following output (reference [Output format](#output-format) for details)
 In agent message format, `query` and `response` are a list of OpenAI-style messages. Specifically, `query` carries the past agent-user interactions leading up to the last user query and requires the system message (of the agent) on top of the list; and `response` carries the last message of the agent in response to the last user query. 
 
 The expected input format for the evaluators is a Python list of messages as follows:
+
 ```
 [
   {
@@ -585,7 +589,7 @@ response = [
 ```
 
 > [!NOTE]
-> The evaluator throws a warning that query (i.e. the conversation history till the current run) or agent response (the response to the query) cannot be parsed when their format is not the expected one.
+> The evaluator throws a warning that query (the conversation history till the current run) or agent response (the response to the query) can't be parsed when their format isn't the expected one.
 
 See an example of evaluating the agent messages with `ToolCallAccuracyEvaluator`:
 
diff --git a/articles/ai-foundry/how-to/develop/evaluate-sdk.md b/articles/ai-foundry/how-to/develop/evaluate-sdk.md
@@ -53,7 +53,7 @@ Built-in quality and safety metrics take in query and response pairs, along with
 Built-in evaluators can accept query and response pairs, a list of conversations in JSON Lines (JSONL) format, or both.
 
 | Evaluator | Conversation & single-turn support for text | Conversation & single-turn support for text and image | Single-turn support for text only | Requires `ground_truth` | Supports [agent inputs](./agent-evaluate-sdk.md#agent-messages) |
-|-----------|---------------------------------------------|-------------------------------------------------------|-----------------------------------|---------------------|----------------------|
+|--|--|--|--|--|--|
 | **Quality Evaluators** |
 | `IntentResolutionEvaluator` | | | | | ✓ |
 | `ToolCallAccuracyEvaluator` | | | | | ✓ |
@@ -67,7 +67,7 @@ Built-in evaluators can accept query and response pairs, a list of conversations
 | `FluencyEvaluator` | ✓ | | | | ✓ |
 | `ResponseCompletenessEvaluator` | ✓ | | ✓ | ✓ | |
 | `QAEvaluator` | | | ✓ | ✓ | |
-| **NLP Evaluators** |
+| **Natural Language Processing (NLP) Evaluators** |
 | `SimilarityEvaluator` | | | ✓ | ✓ | |
 | `F1ScoreEvaluator` | | | ✓ | ✓ | |
 | `RougeScoreEvaluator` | | | ✓ | ✓ | |
@@ -94,7 +94,6 @@ Built-in evaluators can accept query and response pairs, a list of conversations
 > [!NOTE]
 > AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators, except that it will be 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
 
-
 Azure OpenAI graders require a template that describes how their input columns are turned into the *real* input that the grader uses. Example: If you have two inputs called *query* and *response*, and a template that was formatted as `{{item.query}}`, then only the query would be used. Similarly, you could have something like `{{item.conversation}}` to accept a conversation input, but the ability of the system to handle that depends on how you configure the rest of the grader to expect that input.
 
 For more information on data requirements for agentic evaluators, go to [Run agent evaluations locally with the Azure AI Evaluation SDK](agent-evaluate-sdk.md).
@@ -106,7 +105,7 @@ All built-in evaluators take single-turn inputs as query-and-response pairs in s
 ```python
 from azure.ai.evaluation import RelevanceEvaluator
 
-query = "What is the cpital of life?"
+query = "What is the capital of life?"
 response = "Paris."
 
 # Initialize an evaluator:
@@ -211,7 +210,7 @@ model_config = AzureOpenAIModelConfiguration(
     api_version=os.environ.get("AZURE_API_VERSION"),
 )
 
-# Initialize the Groundedness and Groundedness Pro evaluators:
+# Initialize the Groundedness evaluator:
 groundedness_eval = GroundednessEvaluator(model_config)
 
 conversation = {
@@ -503,9 +502,9 @@ Here's an example of how to set the `evaluators` parameters:
 result = evaluate(
     data="data.jsonl",
     evaluators={
-        "sexual":sexual_evaluator
-        "self_harm":self_harm_evaluator
-        "hate_unfairness":hate_unfairness_evaluator
+        "sexual":sexual_evaluator,
+        "self_harm":self_harm_evaluator,
+        "hate_unfairness":hate_unfairness_evaluator,
         "violence":violence_evaluator
     }
 )
@@ -520,7 +519,7 @@ A target can be any callable class in your directory. In this case, we have a Py
 Here's the content in `"data.jsonl"`:
 
 ```json
-{"query":"When was United Stated found ?", "response":"1776"}
+{"query":"When was United States found ?", "response":"1776"}
 {"query":"What is the capital of France?", "response":"Paris"}
 {"query":"Who is the best tennis player of all time ?", "response":"Roger Federer"}
 ```
@@ -537,8 +536,8 @@ result = evaluate(
     evaluator_config={
         "default": {
             "column_mapping": {
-                "query": "${data.queries}"
-                "context": "${outputs.context}"
+                "query": "${data.queries}",
+                "context": "${outputs.context}",
                 "response": "${outputs.response}"
             } 
         }
diff --git a/articles/ai-services/speech-service/batch-synthesis-properties.md b/articles/ai-services/speech-service/batch-synthesis-properties.md
@@ -31,7 +31,7 @@ Batch synthesis properties are described in the following table.
 |`customVoices`|The map of a custom voice name and its deployment ID.<br/><br/>For example: `"customVoices": {"your-custom-voice-name": "502ac834-6537-4bc3-9fd6-140114daa66d"}`<br/><br/>You can use the voice name in your `synthesisConfig.voice` (when the `inputKind` is set to `"PlainText"`) or within the SSML text of `inputs` (when the `inputKind` is set to `"SSML"`).<br/><br/>This property is required to use a custom voice. If you try to use a custom voice that isn't defined here, the service returns an error.|
 |`description`|The description of the batch synthesis.<br/><br/>This property is optional.|
 |`id`|The batch synthesis job ID you passed in path.<br/><br/>This property is required in path.|
-|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"text": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"text": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-AvaMultilingualNeural'\''>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"text": "synthesize this to a file"},{"text": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"text": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
+|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"content": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"content": "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='en-US-AvaMultilingualNeural'>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"content": "synthesize this to a file"},{"content": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"content": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
 |`lastActionDateTime`|The most recent date and time when the `status` property value changed.<br/><br/>This property is read-only.|
 |`outputs.result`|The location of the batch synthesis result files with audio output and logs.<br/><br/>This property is read-only.|
 |`properties`|A defined set of optional batch synthesis configuration settings.|
@@ -40,7 +40,7 @@ Batch synthesis properties are described in the following table.
 |`properties.concatenateResult`|Determines whether to concatenate the result. This optional `bool` value ("true" or "false") is "false" by default.|
 |`properties.decompressOutputFiles`|Determines whether to unzip the synthesis result files in the destination container. This property can only be set when the `destinationContainerUrl` property is set. This optional `bool` value ("true" or "false") is "false" by default.|
 |`properties.destinationContainerUrl`|The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
-|`properties.destinationPath`|The prefix path where batch synthesis results can be stored with. If you don't specify a prefix path, the default prefix path is `YourSpeechResourceId/YourSynthesisId`.<br/><br/>This optional property can only be set when the `destinationContainerUrl` property is set.|
+|`properties.destinationPath`|The prefix path for storing batch synthesis results. If no prefix path is provided, a system-generated path will be used.<br/><br/>This property is optional and can only be set when the `destinationContainerUrl` property is specified.|
 |`properties.durationInMilliseconds`|The audio output duration in milliseconds.<br/><br/>This property is read-only.|
 |`properties.failedAudioCount`|The count of batch synthesis inputs to audio output failed.<br/><br/>This property is read-only.|
 |`properties.outputFormat`|The audio output format.<br/><br/>For information about the accepted values, see [audio output formats](rest-text-to-speech.md#audio-outputs). The default output format is `riff-24khz-16bit-mono-pcm`.|
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md b/articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md
diff --git a/articles/machine-learning/how-to-configure-private-link.md b/articles/machine-learning/how-to-configure-private-link.md