Skip to content

Commit 7478262

Browse files
committed
minor updates
1 parent 85aa4eb commit 7478262

File tree

3 files changed

+16
-16
lines changed

3 files changed

+16
-16
lines changed

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ The result of the AI-assisted quality evaluators for a query and response pair i
189189
To further improve intelligibility, all evaluators accept a binary threshold (unless they output already binary outputs) and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
190190

191191
- `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
192-
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
192+
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user.
193193
- `additional_details` contains debugging information about the quality of a single agent run.
194194

195195
```json
@@ -238,7 +238,7 @@ from azure.ai.evaluation import AIAgentConverter
238238
# Initialize the converter
239239
converter = AIAgentConverter(project_client)
240240

241-
# specify a file path to save agent output (which is evaluation input data)
241+
# Specify a file path to save agent output (which is evaluation input data)
242242
filename = os.path.join(os.getcwd(), "evaluation_input_data.jsonl")
243243

244244
evaluation_data = converter.prepare_evaluation_data(thread_ids=thread_id, filename=filename)
@@ -255,7 +255,7 @@ import os
255255
from dotenv import load_dotenv
256256
load_dotenv()
257257

258-
258+
# Another convenient way to access model config from the project_client
259259
project_client = AIProjectClient.from_connection_string(
260260
credential=DefaultAzureCredential(),
261261
conn_str=os.environ["PROJECT_CONNECTION_STRING"],
@@ -269,12 +269,12 @@ model_config = project_client.connections.get_default(
269269
include_credentials=True
270270
)
271271

272-
# select evaluators
272+
# Select evaluators of your choice
273273
intent_resolution = IntentResolutionEvaluator(model_config=model_config)
274274
task_adherence = TaskAdherenceEvaluator(model_config=model_config)
275275
tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)
276276

277-
# batch run API
277+
# Batch evaluation API (local)
278278
from azure.ai.evaluation import evaluate
279279

280280
response = evaluate(
@@ -292,9 +292,9 @@ response = evaluate(
292292
"resource_group_name": os.environ["RESOURCE_GROUP_NAME"],
293293
}
294294
)
295-
# look at the average scores
295+
# Inspect the average scores at a high-level
296296
print(response["metrics"])
297-
# use the URL to inspect the results on the UI
297+
# Use the URL to inspect the results on the UI
298298
print(f'AI Foundary URL: {response.get("studio_url")}')
299299
```
300300

@@ -303,7 +303,7 @@ Following the URI, you will be redirected to Foundry to view your evaluation res
303303
With Azure AI Evaluation SDK client library, you can seamlessly evaluate your Azure AI agents via our converter support, which enables observability and transparency into agentic workflows.
304304

305305

306-
## Evaluators with agent message support
306+
## Evaluating other agents
307307

308308
For agents outside of Azure AI Agent Service, you can still evaluate them by preparing the right data for the evaluators of your choice.
309309

@@ -328,7 +328,7 @@ We'll demonstrate some examples of the two data formats: simple agent data, and
328328
As with other [built-in AI-assisted quality evaluators](./evaluate-sdk.md#performance-and-quality-evaluators), `IntentResolutionEvaluator` and `TaskAdherenceEvaluator` output a likert score (integer 1-5; higher score is better). `ToolCallAccuracyEvaluator` outputs the passing rate of all tool calls made (a float between 0-1) based on user query. To further improve intelligibility, all evaluators accept a binary threshold and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
329329

330330
- `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
331-
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
331+
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user.
332332

333333
### Simple agent data
334334

articles/ai-foundry/how-to/develop/cloud-evaluation.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -103,9 +103,9 @@ or contain conversation data like this:
103103
}
104104
```
105105

106-
For more details on input data formats, refer to [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
106+
To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
107107

108-
For agent evaluation, refer to [evaluator support for agent messages](./agent-evaluate-sdk.md#evaluators-with-agent-message-support).
108+
To learn more about input data formats for evaluating agents, see [evaluating Azure AI agents](./agent-evaluate-sdk.md#evaluate-azure-ai-agents) and [evaluating other agents](./agent-evaluate-sdk.md/#evaluating-other-agents).
109109

110110

111111
We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
@@ -236,9 +236,9 @@ print("Versioned evaluator id:", registered_evaluator.id)
236236
237237
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
238238
239-
## Cloud evaluation (preview) with Azure AI Projects SDK
239+
## Submit a cloud evaluation
240240
241-
Putting the above altogether, you can now submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
241+
Putting the previous code altogether, you can now submit a cloud evaluation with Azure AI Projects SDK client library via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
242242
243243
```python
244244
import os, time
@@ -257,7 +257,7 @@ project_client = AIProjectClient.from_connection_string(
257257
conn_str="<connection_string>"
258258
)
259259
260-
# Construct dataset ID per the instruction above
260+
# Construct dataset ID per the instruction previously
261261
data_id = "<dataset-id>"
262262
263263
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ conversation = {
161161

162162
```
163163

164-
To run batch evaluations using [local evaluation](#local-evaluation-on-test-datasets-using-evaluate) or [upload your dataset to run cloud evaluation](./cloud-evaluation.md#uploading-evaluation-data), you will need to represent the dataset in `.jsonl` format. The above conversation is equivalent to a line of dataset as following in a `.jsonl` file:
164+
To run batch evaluations using [local evaluation](#local-evaluation-on-test-datasets-using-evaluate) or [upload your dataset to run cloud evaluation](./cloud-evaluation.md#uploading-evaluation-data), you will need to represent the dataset in `.jsonl` format. The previous conversation is equivalent to a line of dataset as following in a `.jsonl` file:
165165

166166
```json
167167
{"conversation":
@@ -441,7 +441,7 @@ The result of the AI-assisted quality evaluators for a query and response pair i
441441
To further improve intelligibility, all evaluators accept a binary threshold (unless they output already binary outputs) and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
442442

443443
- `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
444-
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
444+
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user.
445445

446446

447447

0 commit comments

Comments
 (0)