Skip to content

Commit 4bce547

Browse files
committed
agent and eval sdk updates
2 parents 180dbd4 + 3bf83ab commit 4bce547

File tree

6 files changed

+29
-23
lines changed

6 files changed

+29
-23
lines changed

articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ load_dotenv()
4747

4848
model_config = AzureOpenAIModelConfiguration(
4949
azure_endpoint=os.environ["AZURE_ENDPOINT"],
50-
api_key=os.environ.get["AZURE_API_KEY"],
50+
api_key=os.environ.get("AZURE_API_KEY"),
5151
azure_deployment=os.environ.get("AZURE_DEPLOYMENT_NAME"),
5252
api_version=os.environ.get("AZURE_API_VERSION"),
5353
)
@@ -104,7 +104,7 @@ The numerical score is on a Likert scale (integer 1 to 5) and a higher score is
104104

105105
```
106106

107-
If you're building agents outside of Azure AI Agent Serice, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
107+
If you're building agents outside of Azure AI Agent Service, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
108108

109109
## Tool call accuracy
110110

@@ -114,7 +114,7 @@ If you're building agents outside of Azure AI Agent Serice, this evaluator accep
114114
- the counts of missing or excessive calls.
115115

116116
> [!NOTE]
117-
> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.
117+
> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.
118118
119119
### Tool call accuracy example
120120

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ Agents can use tool. Here's an example of creating custom tools you intend the a
5959
```python
6060
from azure.ai.projects.models import FunctionTool, ToolSet
6161
from typing import Set, Callable, Any
62+
import json
6263

6364
# Define a custom Python function.
6465
def fetch_weather(location: str) -> str:
@@ -177,7 +178,7 @@ And that's it! `converted_data` contains all inputs required for [these evaluato
177178

178179
For complex tasks that require refined reasoning for the evaluation, we recommend a strong reasoning model like `o3-mini` or the o-series mini models released afterwards with a balance of reasoning performance and cost efficiency.
179180

180-
We set up a list of quality and safety evaluator in `quality_evaluators` and `safety_evaluators` and reference them in [evaluating multiples agent runs or a thread](#evaluate-multiple-agent-runs-or-threads).
181+
We set up a list of quality and safety evaluators in `quality_evaluators` and `safety_evaluators` and reference them in [evaluating multiples agent runs or a thread](#evaluate-multiple-agent-runs-or-threads).
181182

182183
```python
183184
# This is specific to agentic workflows.
@@ -213,7 +214,7 @@ quality_evaluators.update({ evaluator.__name__: evaluator(model_config=model_con
213214
## Using Azure AI Foundry (non-Hub) project endpoint, example: AZURE_AI_PROJECT=https://your-account.services.ai.azure.com/api/projects/your-project
214215
azure_ai_project = os.environ.get("AZURE_AI_PROJECT")
215216

216-
safety_evaluators = {evaluator.__name__: evaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential()) for evaluator in[ContentSafetyEvaluator, IndirectAttackEvaluator, CodeVulnerabilityEvaluator]}
217+
safety_evaluators = {evaluator.__name__: evaluator(azure_ai_project=azure_ai_project, credential=DefaultAzureCredential()) for evaluator in [ContentSafetyEvaluator, IndirectAttackEvaluator, CodeVulnerabilityEvaluator]}
217218

218219
# Reference the quality and safety evaluator list above.
219220
quality_and_safety_evaluators = {**quality_evaluators, **safety_evaluators}
@@ -377,7 +378,9 @@ See the following output (reference [Output format](#output-format) for details)
377378
"intent_resolution_reason": "The response provides the opening hours of the Eiffel Tower, which directly addresses the user's query. The information is clear, accurate, and complete, fully resolving the user's intent.",
378379
}
379380
```
381+
380382
### Agent tool calls and definitions
383+
381384
See the following examples of `tool_calls` and `tool_definitions` for `ToolCallAccuracyEvaluator`:
382385

383386
```python
@@ -438,6 +441,7 @@ See the following output (reference [Output format](#output-format) for details)
438441
In agent message format, `query` and `response` are a list of OpenAI-style messages. Specifically, `query` carries the past agent-user interactions leading up to the last user query and requires the system message (of the agent) on top of the list; and `response` carries the last message of the agent in response to the last user query.
439442

440443
The expected input format for the evaluators is a Python list of messages as follows:
444+
441445
```
442446
[
443447
{
@@ -585,7 +589,7 @@ response = [
585589
```
586590

587591
> [!NOTE]
588-
> The evaluator throws a warning that query (i.e. the conversation history till the current run) or agent response (the response to the query) cannot be parsed when their format is not the expected one.
592+
> The evaluator throws a warning that query (the conversation history till the current run) or agent response (the response to the query) can't be parsed when their format isn't the expected one.
589593
590594
See an example of evaluating the agent messages with `ToolCallAccuracyEvaluator`:
591595

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Built-in quality and safety metrics take in query and response pairs, along with
5353
Built-in evaluators can accept query and response pairs, a list of conversations in JSON Lines (JSONL) format, or both.
5454

5555
| Evaluator | Conversation & single-turn support for text | Conversation & single-turn support for text and image | Single-turn support for text only | Requires `ground_truth` | Supports [agent inputs](./agent-evaluate-sdk.md#agent-messages) |
56-
|-----------|---------------------------------------------|-------------------------------------------------------|-----------------------------------|---------------------|----------------------|
56+
|--|--|--|--|--|--|
5757
| **Quality Evaluators** |
5858
| `IntentResolutionEvaluator` | | | | ||
5959
| `ToolCallAccuracyEvaluator` | | | | ||
@@ -67,7 +67,7 @@ Built-in evaluators can accept query and response pairs, a list of conversations
6767
| `FluencyEvaluator` || | | ||
6868
| `ResponseCompletenessEvaluator` || ||| |
6969
| `QAEvaluator` | | ||| |
70-
| **NLP Evaluators** |
70+
| **Natural Language Processing (NLP) Evaluators** |
7171
| `SimilarityEvaluator` | | ||| |
7272
| `F1ScoreEvaluator` | | ||| |
7373
| `RougeScoreEvaluator` | | ||| |
@@ -94,7 +94,6 @@ Built-in evaluators can accept query and response pairs, a list of conversations
9494
> [!NOTE]
9595
> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators, except that it will be 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
9696
97-
9897
Azure OpenAI graders require a template that describes how their input columns are turned into the *real* input that the grader uses. Example: If you have two inputs called *query* and *response*, and a template that was formatted as `{{item.query}}`, then only the query would be used. Similarly, you could have something like `{{item.conversation}}` to accept a conversation input, but the ability of the system to handle that depends on how you configure the rest of the grader to expect that input.
9998

10099
For more information on data requirements for agentic evaluators, go to [Run agent evaluations locally with the Azure AI Evaluation SDK](agent-evaluate-sdk.md).
@@ -106,7 +105,7 @@ All built-in evaluators take single-turn inputs as query-and-response pairs in s
106105
```python
107106
from azure.ai.evaluation import RelevanceEvaluator
108107

109-
query = "What is the cpital of life?"
108+
query = "What is the capital of life?"
110109
response = "Paris."
111110

112111
# Initialize an evaluator:
@@ -211,7 +210,7 @@ model_config = AzureOpenAIModelConfiguration(
211210
api_version=os.environ.get("AZURE_API_VERSION"),
212211
)
213212

214-
# Initialize the Groundedness and Groundedness Pro evaluators:
213+
# Initialize the Groundedness evaluator:
215214
groundedness_eval = GroundednessEvaluator(model_config)
216215

217216
conversation = {
@@ -503,9 +502,9 @@ Here's an example of how to set the `evaluators` parameters:
503502
result = evaluate(
504503
data="data.jsonl",
505504
evaluators={
506-
"sexual":sexual_evaluator
507-
"self_harm":self_harm_evaluator
508-
"hate_unfairness":hate_unfairness_evaluator
505+
"sexual":sexual_evaluator,
506+
"self_harm":self_harm_evaluator,
507+
"hate_unfairness":hate_unfairness_evaluator,
509508
"violence":violence_evaluator
510509
}
511510
)
@@ -520,7 +519,7 @@ A target can be any callable class in your directory. In this case, we have a Py
520519
Here's the content in `"data.jsonl"`:
521520

522521
```json
523-
{"query":"When was United Stated found ?", "response":"1776"}
522+
{"query":"When was United States found ?", "response":"1776"}
524523
{"query":"What is the capital of France?", "response":"Paris"}
525524
{"query":"Who is the best tennis player of all time ?", "response":"Roger Federer"}
526525
```
@@ -537,8 +536,8 @@ result = evaluate(
537536
evaluator_config={
538537
"default": {
539538
"column_mapping": {
540-
"query": "${data.queries}"
541-
"context": "${outputs.context}"
539+
"query": "${data.queries}",
540+
"context": "${outputs.context}",
542541
"response": "${outputs.response}"
543542
}
544543
}

articles/ai-services/speech-service/batch-synthesis-properties.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Batch synthesis properties are described in the following table.
3131
|`customVoices`|The map of a custom voice name and its deployment ID.<br/><br/>For example: `"customVoices": {"your-custom-voice-name": "502ac834-6537-4bc3-9fd6-140114daa66d"}`<br/><br/>You can use the voice name in your `synthesisConfig.voice` (when the `inputKind` is set to `"PlainText"`) or within the SSML text of `inputs` (when the `inputKind` is set to `"SSML"`).<br/><br/>This property is required to use a custom voice. If you try to use a custom voice that isn't defined here, the service returns an error.|
3232
|`description`|The description of the batch synthesis.<br/><br/>This property is optional.|
3333
|`id`|The batch synthesis job ID you passed in path.<br/><br/>This property is required in path.|
34-
|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"text": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"text": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-AvaMultilingualNeural'\''>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"text": "synthesize this to a file"},{"text": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"text": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
34+
|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"content": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"content": "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='en-US-AvaMultilingualNeural'>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"content": "synthesize this to a file"},{"content": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"content": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
3535
|`lastActionDateTime`|The most recent date and time when the `status` property value changed.<br/><br/>This property is read-only.|
3636
|`outputs.result`|The location of the batch synthesis result files with audio output and logs.<br/><br/>This property is read-only.|
3737
|`properties`|A defined set of optional batch synthesis configuration settings.|
@@ -40,7 +40,7 @@ Batch synthesis properties are described in the following table.
4040
|`properties.concatenateResult`|Determines whether to concatenate the result. This optional `bool` value ("true" or "false") is "false" by default.|
4141
|`properties.decompressOutputFiles`|Determines whether to unzip the synthesis result files in the destination container. This property can only be set when the `destinationContainerUrl` property is set. This optional `bool` value ("true" or "false") is "false" by default.|
4242
|`properties.destinationContainerUrl`|The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
43-
|`properties.destinationPath`|The prefix path where batch synthesis results can be stored with. If you don't specify a prefix path, the default prefix path is `YourSpeechResourceId/YourSynthesisId`.<br/><br/>This optional property can only be set when the `destinationContainerUrl` property is set.|
43+
|`properties.destinationPath`|The prefix path for storing batch synthesis results. If no prefix path is provided, a system-generated path will be used.<br/><br/>This property is optional and can only be set when the `destinationContainerUrl` property is specified.|
4444
|`properties.durationInMilliseconds`|The audio output duration in milliseconds.<br/><br/>This property is read-only.|
4545
|`properties.failedAudioCount`|The count of batch synthesis inputs to audio output failed.<br/><br/>This property is read-only.|
4646
|`properties.outputFormat`|The audio output format.<br/><br/>For information about the accepted values, see [audio output formats](rest-text-to-speech.md#audio-outputs). The default output format is `riff-24khz-16bit-mono-pcm`.|

0 commit comments

Comments
 (0)