Skip to content

Commit 5aedab4

Browse files
committed
Merge branch 'main' into release-phi-4-reasoning-models
2 parents 6a50334 + ff1f403 commit 5aedab4

27 files changed

+701
-343
lines changed

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 338 additions & 194 deletions
Large diffs are not rendered by default.

articles/ai-foundry/how-to/develop/cloud-evaluation.md

Lines changed: 60 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -69,17 +69,58 @@ project_client = AIProjectClient.from_connection_string(
6969

7070
## Uploading evaluation data
7171

72+
Prepare the data according to the [input data requirements for built-in evaluators](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). For example in text evaluation, prepare a `"./evaluate_test_data.jsonl"` file that contains single-turn data inputs like this:
73+
```json
74+
{"query":"What is the capital of France?","response":"Paris."}
75+
{"query":"What atoms compose water?","response":"Hydrogen and oxygen."}
76+
{"query":"What color is my shirt?","response":"Blue."}
77+
```
78+
or contains conversation data like this:
79+
```json
80+
{"conversation":
81+
{
82+
"messages": [
83+
{
84+
"content": "Which tent is the most waterproof?",
85+
"role": "user"
86+
},
87+
{
88+
"content": "The Alpine Explorer Tent is the most waterproof",
89+
"role": "assistant",
90+
"context": "From the our product list the alpine explorer tent is the most waterproof. The Adventure Dining Table has higher weight."
91+
},
92+
{
93+
"content": "How much does it cost?",
94+
"role": "user"
95+
},
96+
{
97+
"content": "The Alpine Explorer Tent is $120.",
98+
"role": "assistant",
99+
"context": null
100+
}
101+
]
102+
}
103+
}
104+
```
105+
106+
To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
107+
108+
To learn more about input data formats for evaluating agents, see [evaluating Azure AI agents](./agent-evaluate-sdk.md#evaluate-azure-ai-agents) and [evaluating other agents](./agent-evaluate-sdk.md#evaluating-other-agents).
109+
110+
72111
We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
73112

74-
1. **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result:
113+
- Uploading new datasets to your Project:
114+
115+
- **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result.
75116

76117
```python
77118
data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl")
78119
```
79120

80-
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
121+
- **From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
81122

82-
2. Given existing datasets uploaded to your Project:
123+
- Specifying existing datasets uploaded to your Project:
83124

84125
- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
85126

@@ -195,9 +236,9 @@ print("Versioned evaluator id:", registered_evaluator.id)
195236
196237
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
197238
198-
## Cloud evaluation (preview) with Azure AI Projects SDK
239+
## Submit a cloud evaluation
199240
200-
You can now submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
241+
Putting the previous code altogether, you can now submit a cloud evaluation with Azure AI Projects SDK client library via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
201242
202243
```python
203244
import os, time
@@ -216,20 +257,16 @@ project_client = AIProjectClient.from_connection_string(
216257
conn_str="<connection_string>"
217258
)
218259
219-
# Construct dataset ID per the instruction
260+
# Construct dataset ID per the instruction previously
220261
data_id = "<dataset-id>"
221262
222263
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
223264
224265
# Use the same model_config for your evaluator (or use different ones if needed)
225266
model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version)
226267
227-
# Create an evaluation
228-
evaluation = Evaluation(
229-
display_name="Cloud evaluation",
230-
description="Evaluation of dataset",
231-
data=Dataset(id=data_id),
232-
evaluators={
268+
# select the list of evaluators you care about
269+
evaluators = {
233270
# Note the evaluator configuration key must follow a naming convention
234271
# the string must start with a letter with only alphanumeric characters
235272
# and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator"
@@ -255,15 +292,22 @@ evaluation = Evaluation(
255292
"model_config": model_config
256293
}
257294
)
258-
},
295+
}
296+
297+
# Create an evaluation
298+
evaluation = Evaluation(
299+
display_name="Cloud evaluation",
300+
description="Evaluation of dataset",
301+
data=Dataset(id=data_id),
302+
evaluators=evaluators
259303
)
260304
261305
# Create evaluation
262306
evaluation_response = project_client.evaluations.create(
263307
evaluation=evaluation,
264308
)
265309
266-
# Get evaluation
310+
# Get evaluation result
267311
get_evaluation_response = project_client.evaluations.get(evaluation_response.id)
268312
269313
print("----------------------------------------------------------------")
@@ -272,7 +316,8 @@ print("Evaluation status: ", get_evaluation_response.status)
272316
print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"])
273317
print("----------------------------------------------------------------")
274318
```
275-
Now you can use the URI to view your evaluation results in your Azure AI project, in order to better assess the quality and safety performance of your applications.
319+
320+
Following the URI, you will be redirected to Foundry to view your evaluation results in your Azure AI project and debug your application. Using reason fields and pass/fail, you will be able to better assess the quality and safety performance of your applications. You can run and compare multiple runs to test for regression or improvements.
276321
277322
## Related content
278323

0 commit comments

Comments
 (0)