You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -104,7 +104,7 @@ The numerical score is on a Likert scale (integer 1 to 5) and a higher score is
104
104
105
105
```
106
106
107
-
If you're building agents outside of Azure AI Agent Serice, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
107
+
If you're building agents outside of Azure AI Agent Service, this evaluator accepts a schema typical for agent messages. To learn more, see our sample notebook for [Intent Resolution](https://aka.ms/intentresolution-sample).
108
108
109
109
## Tool call accuracy
110
110
@@ -114,7 +114,7 @@ If you're building agents outside of Azure AI Agent Serice, this evaluator accep
114
114
- the counts of missing or excessive calls.
115
115
116
116
> [!NOTE]
117
-
> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.
117
+
> `ToolCallAccuracyEvaluator` only supports Azure AI Agent's Function Tool evaluation, but doesn't support Built-in Tool evaluation. The agent run must have at least one Function Tool call and no Built-in Tool calls made to be evaluated.
Copy file name to clipboardExpand all lines: articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,6 +59,7 @@ Agents can use tool. Here's an example of creating custom tools you intend the a
59
59
```python
60
60
from azure.ai.projects.models import FunctionTool, ToolSet
61
61
from typing import Set, Callable, Any
62
+
import json
62
63
63
64
# Define a custom Python function.
64
65
deffetch_weather(location: str) -> str:
@@ -177,7 +178,7 @@ And that's it! `converted_data` contains all inputs required for [these evaluato
177
178
178
179
For complex tasks that require refined reasoning for the evaluation, we recommend a strong reasoning model like `o3-mini` or the o-series mini models released afterwards with a balance of reasoning performance and cost efficiency.
179
180
180
-
We set up a list of quality and safety evaluator in `quality_evaluators` and `safety_evaluators` and reference them in [evaluating multiples agent runs or a thread](#evaluate-multiple-agent-runs-or-threads).
181
+
We set up a list of quality and safety evaluators in `quality_evaluators` and `safety_evaluators` and reference them in [evaluating multiples agent runs or a thread](#evaluate-multiple-agent-runs-or-threads).
@@ -377,7 +378,9 @@ See the following output (reference [Output format](#output-format) for details)
377
378
"intent_resolution_reason": "The response provides the opening hours of the Eiffel Tower, which directly addresses the user's query. The information is clear, accurate, and complete, fully resolving the user's intent.",
378
379
}
379
380
```
381
+
380
382
### Agent tool calls and definitions
383
+
381
384
See the following examples of `tool_calls` and `tool_definitions` for `ToolCallAccuracyEvaluator`:
382
385
383
386
```python
@@ -438,6 +441,7 @@ See the following output (reference [Output format](#output-format) for details)
438
441
In agent message format, `query` and `response` are a list of OpenAI-style messages. Specifically, `query` carries the past agent-user interactions leading up to the last user query and requires the system message (of the agent) on top of the list; and `response` carries the last message of the agent in response to the last user query.
439
442
440
443
The expected input format for the evaluators is a Python list of messages as follows:
444
+
441
445
```
442
446
[
443
447
{
@@ -585,7 +589,7 @@ response = [
585
589
```
586
590
587
591
> [!NOTE]
588
-
> The evaluator throws a warning that query (i.e. the conversation history till the current run) or agent response (the response to the query) cannot be parsed when their format is not the expected one.
592
+
> The evaluator throws a warning that query (the conversation history till the current run) or agent response (the response to the query) can't be parsed when their format isn't the expected one.
589
593
590
594
See an example of evaluating the agent messages with `ToolCallAccuracyEvaluator`:
Copy file name to clipboardExpand all lines: articles/ai-foundry/how-to/develop/evaluate-sdk.md
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ Built-in quality and safety metrics take in query and response pairs, along with
53
53
Built-in evaluators can accept query and response pairs, a list of conversations in JSON Lines (JSONL) format, or both.
54
54
55
55
| Evaluator | Conversation & single-turn support for text | Conversation & single-turn support for text and image | Single-turn support for text only | Requires `ground_truth`| Supports [agent inputs](./agent-evaluate-sdk.md#agent-messages)|
@@ -67,7 +67,7 @@ Built-in evaluators can accept query and response pairs, a list of conversations
67
67
|`FluencyEvaluator`| ✓ |||| ✓ |
68
68
|`ResponseCompletenessEvaluator`| ✓ || ✓ | ✓ ||
69
69
|`QAEvaluator`||| ✓ | ✓ ||
70
-
|**NLP Evaluators**|
70
+
|**Natural Language Processing (NLP) Evaluators**|
71
71
|`SimilarityEvaluator`||| ✓ | ✓ ||
72
72
|`F1ScoreEvaluator`||| ✓ | ✓ ||
73
73
|`RougeScoreEvaluator`||| ✓ | ✓ ||
@@ -94,7 +94,6 @@ Built-in evaluators can accept query and response pairs, a list of conversations
94
94
> [!NOTE]
95
95
> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators, except that it will be 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
96
96
97
-
98
97
Azure OpenAI graders require a template that describes how their input columns are turned into the *real* input that the grader uses. Example: If you have two inputs called *query* and *response*, and a template that was formatted as `{{item.query}}`, then only the query would be used. Similarly, you could have something like `{{item.conversation}}` to accept a conversation input, but the ability of the system to handle that depends on how you configure the rest of the grader to expect that input.
99
98
100
99
For more information on data requirements for agentic evaluators, go to [Run agent evaluations locally with the Azure AI Evaluation SDK](agent-evaluate-sdk.md).
@@ -106,7 +105,7 @@ All built-in evaluators take single-turn inputs as query-and-response pairs in s
106
105
```python
107
106
from azure.ai.evaluation import RelevanceEvaluator
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/batch-synthesis-properties.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ Batch synthesis properties are described in the following table.
31
31
|`customVoices`|The map of a custom voice name and its deployment ID.<br/><br/>For example: `"customVoices": {"your-custom-voice-name": "502ac834-6537-4bc3-9fd6-140114daa66d"}`<br/><br/>You can use the voice name in your `synthesisConfig.voice` (when the `inputKind` is set to `"PlainText"`) or within the SSML text of `inputs` (when the `inputKind` is set to `"SSML"`).<br/><br/>This property is required to use a custom voice. If you try to use a custom voice that isn't defined here, the service returns an error.|
32
32
|`description`|The description of the batch synthesis.<br/><br/>This property is optional.|
33
33
|`id`|The batch synthesis job ID you passed in path.<br/><br/>This property is required in path.|
34
-
|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"text": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"text": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-AvaMultilingualNeural'\''>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"text": "synthesize this to a file"},{"text": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"text": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
34
+
|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"content": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"content": "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='en-US-AvaMultilingualNeural'>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"content": "synthesize this to a file"},{"content": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"content": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
35
35
|`lastActionDateTime`|The most recent date and time when the `status` property value changed.<br/><br/>This property is read-only.|
36
36
|`outputs.result`|The location of the batch synthesis result files with audio output and logs.<br/><br/>This property is read-only.|
37
37
|`properties`|A defined set of optional batch synthesis configuration settings.|
@@ -40,7 +40,7 @@ Batch synthesis properties are described in the following table.
40
40
|`properties.concatenateResult`|Determines whether to concatenate the result. This optional `bool` value ("true" or "false") is "false" by default.|
41
41
|`properties.decompressOutputFiles`|Determines whether to unzip the synthesis result files in the destination container. This property can only be set when the `destinationContainerUrl` property is set. This optional `bool` value ("true" or "false") is "false" by default.|
42
42
|`properties.destinationContainerUrl`|The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
43
-
|`properties.destinationPath`|The prefix path where batch synthesis results can be stored with. If you don't specify a prefix path, the default prefix path is `YourSpeechResourceId/YourSynthesisId`.<br/><br/>This optional property can only be set when the `destinationContainerUrl` property is set.|
43
+
|`properties.destinationPath`|The prefix path for storing batch synthesis results. If no prefix path is provided, a system-generated path will be used.<br/><br/>This property is optional and can only be set when the `destinationContainerUrl` property is specified.|
44
44
|`properties.durationInMilliseconds`|The audio output duration in milliseconds.<br/><br/>This property is read-only.|
45
45
|`properties.failedAudioCount`|The count of batch synthesis inputs to audio output failed.<br/><br/>This property is read-only.|
46
46
|`properties.outputFormat`|The audio output format.<br/><br/>For information about the accepted values, see [audio output formats](rest-text-to-speech.md#audio-outputs). The default output format is `riff-24khz-16bit-mono-pcm`.|
0 commit comments