Skip to content

Commit a8d5aa6

Browse files
Merge pull request #5046 from lgayhardt/evalfixes052502
Build: cloud eval updates
2 parents 54943d5 + 69fd182 commit a8d5aa6

File tree

1 file changed

+104
-179
lines changed

1 file changed

+104
-179
lines changed

articles/ai-foundry/how-to/develop/cloud-evaluation.md

Lines changed: 104 additions & 179 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Cloud evaluation with Azure AI Projects SDK
2+
title: Cloud evaluation with Azure AI Foundry SDK
33
titleSuffix: Azure AI Foundry
44
description: This article provides instructions on how to evaluate a Generative AI application on the cloud.
55
manager: scottpolly
@@ -8,153 +8,160 @@ ms.custom:
88
- references_regions
99
- ignite-2024
1010
ms.topic: how-to
11-
ms.date: 02/21/2025
11+
ms.date: 05/19/2025
1212
ms.reviewer: changliu2
1313
ms.author: lagayhar
1414
author: lgayhardt
1515
---
16-
# Evaluate your Generative AI application on the cloud with Azure AI Projects SDK (preview)
16+
# Run evaluations in the cloud using Azure AI Foundry SDK (preview)
1717

1818
[!INCLUDE [feature-preview](../../includes/feature-preview.md)]
1919

20-
While Azure AI Evaluation SDK client supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring.
20+
While Azure AI Evaluation SDK supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring.
2121

22-
In this article, you learn how to run cloud evaluation (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](../../concepts/observability.md#what-are-evaluators) and your own [custom evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC.
22+
In this article, you learn how to run evaluations in the cloud (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](../../concepts/observability.md#what-are-evaluators) and your own [custom evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC.
2323

24-
## Prerequisites
24+
## Prerequisite set up steps for Azure AI Foundry Projects
2525

26-
- Azure AI project in the same [regions](../../concepts/evaluation-evaluators/risk-safety-evaluators.md#azure-ai-foundry-project-configuration-and-region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
26+
- Azure AI Foundry project in the same supported [regions](../../concepts/evaluation-evaluators/risk-safety-evaluators.md#azure-ai-foundry-project-configuration-and-region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI Foundry project](../create-projects.md?tabs=ai-studio) to create one.
2727

2828
- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
29-
- `Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
3029
- Make sure you're first logged into your Azure subscription by running `az login`.
3130

32-
### Installation Instructions
31+
If this is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few additional setup steps.
3332

34-
1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
33+
1. [Create and connect your storage account](https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/microsoft/infrastructure-setup/01-connections/connection-storage-account.bicep) to your Azure AI Foundry project at the resource level. This bicep template provisions and connects a storage account to your Foundry project with key authentication.
34+
2. Make sure the connected storage account has access to all projects.
35+
3. If you connected your storage account with Microsoft Entra ID, make sure to give MSI (Microsoft Identity) permissions for Storage Blob Data Owner to both your account and Foundry project resource in Azure portal.
3536

36-
```bash
37-
conda create -n cloud-evaluation
38-
conda activate cloud-evaluation
39-
```
37+
### Getting started
4038

41-
2. Install the required packages by running the following command:
39+
First, install Azure AI Foundry SDK's project client which runs the evaluations in the cloud
4240

43-
```bash
44-
pip install azure-identity azure-ai-projects azure-ai-ml
45-
```
46-
47-
Optionally you can use `pip install azure-ai-evaluation` if you want a code-first experience to fetch evaluator ID for built-in evaluators in code.
41+
```python
42+
uv install azure-ai-projects azure-identity
43+
```
4844

49-
Now you can define a client and a deployment which will be used to run your evaluations in the cloud:
45+
> [!NOTE]
46+
> For more detailed information, see the [REST API Reference Documentation](/rest/api/aifoundry/aiprojects/evaluations).
47+
Then, set your environment variables for your Azure AI Foundry resources
5048

5149
```python
50+
import os
5251

53-
import os, time
54-
from azure.ai.projects import AIProjectClient
55-
from azure.identity import DefaultAzureCredential
56-
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
57-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
52+
# Required environment variables
53+
endpoint = os.environ["PROJECT_ENDPOINT"] # https://<account>.services.ai.azure.com/api/projects/<project>
54+
model_endpoint = os.environ["MODEL_ENDPOINT"] # https://<account>.services.ai.azure.com
55+
model_api_key = os.environ["MODEL_API_KEY"]
56+
model_deployment_name = os.environ["MODEL_DEPLOYMENT_NAME"] # e.g. gpt-4o-mini
5857

59-
# Load your Azure OpenAI config
60-
deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
61-
api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
58+
# Optional – reuse an existing dataset
59+
dataset_name = os.environ.get("DATASET_NAME", "dataset-test")
60+
dataset_version = os.environ.get("DATASET_VERSION", "1.0")
61+
```
62+
63+
Now you can define a client which is used to run your evaluations in the cloud:
6264

63-
# Create an Azure AI Client from a connection string. Available on Azure AI project Overview page.
64-
project_client = AIProjectClient.from_connection_string(
65+
```python
66+
import os
67+
from azure.identity import DefaultAzureCredential
68+
from azure.ai.projects import AIProjectClient
69+
70+
# Create the project client (Foundry project and credentials)
71+
project_client = AIProjectClient(
72+
endpoint=endpoint,
6573
credential=DefaultAzureCredential(),
66-
conn_str="<connection_string>"
6774
)
6875
```
6976

7077
## Uploading evaluation data
7178

72-
Prepare the data according to the [input data requirements for built-in evaluators](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). For example in text evaluation, prepare a `"./evaluate_test_data.jsonl"` file that contains single-turn data inputs like this:
73-
```json
74-
{"query":"What is the capital of France?","response":"Paris."}
75-
{"query":"What atoms compose water?","response":"Hydrogen and oxygen."}
76-
{"query":"What color is my shirt?","response":"Blue."}
77-
```
78-
or contains conversation data like this:
79-
```json
80-
{"conversation":
81-
{
82-
"messages": [
83-
{
84-
"content": "Which tent is the most waterproof?",
85-
"role": "user"
86-
},
87-
{
88-
"content": "The Alpine Explorer Tent is the most waterproof",
89-
"role": "assistant",
90-
"context": "From the our product list the alpine explorer tent is the most waterproof. The Adventure Dining Table has higher weight."
91-
},
92-
{
93-
"content": "How much does it cost?",
94-
"role": "user"
95-
},
96-
{
97-
"content": "The Alpine Explorer Tent is $120.",
98-
"role": "assistant",
99-
"context": null
100-
}
101-
]
102-
}
103-
}
79+
```python
80+
# Upload a local jsonl file (skip if you already have a Dataset registered)
81+
data_id = project_client.datasets.upload_file(
82+
name=dataset_name,
83+
version=dataset_version,
84+
file_path="./evaluate_test_data.jsonl",
85+
).id
10486
```
10587

106-
To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
88+
To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
10789

10890
To learn more about input data formats for evaluating agents, see [evaluating Azure AI agents](./agent-evaluate-sdk.md#evaluate-azure-ai-agents) and [evaluating other agents](./agent-evaluate-sdk.md#evaluating-other-agents).
109-
11091

111-
We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
112-
113-
- Uploading new datasets to your Project:
114-
115-
- **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result.
92+
## Specifying evaluators
11693

11794
```python
118-
data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl")
119-
```
120-
121-
- **From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
122-
123-
- Specifying existing datasets uploaded to your Project:
95+
from azure.ai.projects.models import (
96+
EvaluatorConfiguration,
97+
EvaluatorIds,
98+
)
12499

125-
- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
100+
# Built-in evaluator configurations
101+
evaluators = {
102+
"relevance": EvaluatorConfiguration(
103+
id=EvaluatorIds.RELEVANCE.value,
104+
init_params={"deployment_name": model_deployment_name},
105+
data_mapping={
106+
"query": "${data.query}",
107+
"response": "${data.response}",
108+
},
109+
),
110+
"violence": EvaluatorConfiguration(
111+
id=EvaluatorIds.VIOLENCE.value,
112+
init_params={"azure_ai_project": endpoint},
113+
),
114+
"bleu_score": EvaluatorConfiguration(
115+
id=EvaluatorIds.BLEU_SCORE.value,
116+
),
117+
}
118+
```
126119

127-
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format previously.
120+
## Submit evaluation in the cloud
128121

129-
## Specifying evaluators from Evaluator library
122+
Finally, submit the remote evaluation run:
130123

131-
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
124+
```python
125+
from azure.ai.projects.models import (
126+
Evaluation,
127+
InputDataset
128+
)
132129

133-
### Specifying built-in evaluators
130+
# Create an evaluation with the dataset and evaluators specified
131+
evaluation = Evaluation(
132+
display_name="Cloud evaluation",
133+
description="Evaluation of dataset",
134+
data=InputDataset(id=data_id),
135+
evaluators=evaluators,
136+
)
134137

135-
- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK:
138+
# Run the evaluation
139+
evaluation_response = project_client.evaluations.create(
140+
evaluation,
141+
headers={
142+
"model-endpoint": model_endpoint,
143+
"api-key": model_api_key,
144+
},
145+
)
136146

137-
```python
138-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
139-
print("F1 Score evaluator id:", F1ScoreEvaluator.id)
147+
print("Created evaluation:", evaluation_response.name)
148+
print("Status:", evaluation_response.status)
140149
```
141150

142-
- **From UI**: Follows these steps to fetch evaluator IDs after they're registered to your project:
143-
- Select **Evaluation** tab in your Azure AI project;
144-
- Select Evaluator library;
145-
- Select your evaluators of choice by comparing the descriptions;
146-
- Copy its "Asset ID" which will be your evaluator ID, for example, `azureml://registries/azureml/models/Groundedness-Evaluator/versions/1`.
151+
## Specifying custom evaluators
152+
153+
> [!NOTE]
154+
> Azure AI Foundry Projects aren't supported for this feature. Use an Azure AI Hub Project instead.
147155
148-
### Specifying custom evaluators
156+
### Code-based custom evaluators
149157

150-
- For code-based custom evaluators, register them to your Azure AI project and fetch the evaluator IDs as in this example:
158+
Register your custom evaluators to your Azure AI Hub project and fetch the evaluator IDs:
151159

152160
```python
153161
from azure.ai.ml import MLClient
154162
from azure.ai.ml.entities import Model
155163
from promptflow.client import PFClient
156164

157-
158165
# Define ml_client to register custom evaluator
159166
ml_client = MLClient(
160167
subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
@@ -163,7 +170,6 @@ ml_client = MLClient(
163170
credential=DefaultAzureCredential()
164171
)
165172

166-
167173
# Load evaluator from module
168174
from answer_len.answer_length import AnswerLengthEvaluator
169175

@@ -190,7 +196,9 @@ print("Versioned evaluator id:", registered_evaluator.id)
190196

191197
After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project.
192198

193-
- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md#prompt-based-evaluators):
199+
### Prompt-based custom evaluators
200+
201+
Follow the example to register a custom `FriendlinessEvaluator` built as described in [Prompt-based evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md#prompt-based-evaluators):
194202

195203
```python
196204
# Import your prompt-based custom evaluator
@@ -236,89 +244,6 @@ print("Versioned evaluator id:", registered_evaluator.id)
236244

237245
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
238246

239-
## Submit a cloud evaluation
240-
241-
Putting the previous code altogether, you can now submit a cloud evaluation with Azure AI Projects SDK client library via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
242-
243-
```python
244-
import os, time
245-
from azure.ai.projects import AIProjectClient
246-
from azure.identity import DefaultAzureCredential
247-
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
248-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
249-
250-
# Load your Azure OpenAI config
251-
deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
252-
api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
253-
254-
# Create an Azure AI Client from a connection string. Avaiable on project overview page on Azure AI project UI.
255-
project_client = AIProjectClient.from_connection_string(
256-
credential=DefaultAzureCredential(),
257-
conn_str="<connection_string>"
258-
)
259-
260-
# Construct dataset ID per the instruction previously
261-
data_id = "<dataset-id>"
262-
263-
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
264-
265-
# Use the same model_config for your evaluator (or use different ones if needed)
266-
model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version)
267-
268-
# select the list of evaluators you care about
269-
evaluators = {
270-
# Note the evaluator configuration key must follow a naming convention
271-
# the string must start with a letter with only alphanumeric characters
272-
# and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator"
273-
# will also be acceptable, but "f1-score-eval" or "1score" will result in errors.
274-
"f1_score": EvaluatorConfiguration(
275-
id=F1ScoreEvaluator.id,
276-
),
277-
"relevance": EvaluatorConfiguration(
278-
id=RelevanceEvaluator.id,
279-
init_params={
280-
"model_config": model_config
281-
},
282-
),
283-
"violence": EvaluatorConfiguration(
284-
id=ViolenceEvaluator.id,
285-
init_params={
286-
"azure_ai_project": project_client.scope
287-
},
288-
),
289-
"friendliness": EvaluatorConfiguration(
290-
id="<custom_evaluator_id>",
291-
init_params={
292-
"model_config": model_config
293-
}
294-
)
295-
}
296-
297-
# Create an evaluation
298-
evaluation = Evaluation(
299-
display_name="Cloud evaluation",
300-
description="Evaluation of dataset",
301-
data=Dataset(id=data_id),
302-
evaluators=evaluators
303-
)
304-
305-
# Create evaluation
306-
evaluation_response = project_client.evaluations.create(
307-
evaluation=evaluation,
308-
)
309-
310-
# Get evaluation result
311-
get_evaluation_response = project_client.evaluations.get(evaluation_response.id)
312-
313-
print("----------------------------------------------------------------")
314-
print("Created evaluation, evaluation ID: ", get_evaluation_response.id)
315-
print("Evaluation status: ", get_evaluation_response.status)
316-
print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"])
317-
print("----------------------------------------------------------------")
318-
```
319-
320-
Following the URI, you will be redirected to Foundry to view your evaluation results in your Azure AI project and debug your application. Using reason fields and pass/fail, you will be able to better assess the quality and safety performance of your applications. You can run and compare multiple runs to test for regression or improvements.
321-
322247
## Related content
323248

324249
- [Evaluate your Generative AI applications locally](./evaluate-sdk.md)

0 commit comments

Comments
 (0)