Skip to content

Commit 69467c9

Browse files
Merge pull request #3091 from lgayhardt/eval0225
Split eval sdk and cloud eval docs
2 parents 6f8e2f0 + 6a4b543 commit 69467c9

File tree

5 files changed

+327
-302
lines changed

5 files changed

+327
-302
lines changed

articles/ai-studio/.openpublishing.redirection.ai-studio.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,16 @@
289289
"source_path_from_root": "/articles/ai-studio/concepts/related-content.md",
290290
"redirect_url": "/azure/ai-studio/concepts/what-are-ai-services",
291291
"redirect_document_id": false
292+
},
293+
{
294+
"source_path_from_root": "/articles/ai-studio/how-to/develop/evaluate-sdk.md#cloud-evaluation-preview-on-test-datasets",
295+
"redirect_url": "/azure/ai-studio/how-to/develop/cloud-evaluation",
296+
"redirect_document_id": false
297+
},
298+
{
299+
"source_path_from_root": "/articles/ai-studio/how-to/develop/evaluate-sdk.md#cloud-evaluation-on-test-datasets",
300+
"redirect_url": "/azure/ai-studio/how-to/develop/cloud-evaluation",
301+
"redirect_document_id": false
292302
}
293303
]
294304
}
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
---
2+
title: Cloud evaluation with Azure AI Projects SDK
3+
titleSuffix: Azure AI Foundry
4+
description: This article provides instructions on how to evaluate a Generative AI application on the cloud.
5+
manager: scottpolly
6+
ms.service: azure-ai-foundry
7+
ms.custom:
8+
- references_regions
9+
- ignite-2024
10+
ms.topic: how-to
11+
ms.date: 02/21/2025
12+
ms.reviewer: changliu2
13+
ms.author: lagayhar
14+
author: lgayhardt
15+
---
16+
# Evaluate your Generative AI application on the cloud with Azure AI Projects SDK (preview)
17+
18+
[!INCLUDE [feature-preview](../../includes/feature-preview.md)]
19+
20+
While Azure AI Evaluation SDK client supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring.
21+
22+
In this article, you learn how to run cloud evaluation (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](./evaluate-sdk.md#built-in-evaluators) and your own [custom evaluators](./evaluate-sdk.md#custom-evaluators) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC.
23+
24+
## Prerequisites
25+
26+
- Azure AI project in the same [regions](./evaluate-sdk.md#region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
27+
28+
- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
29+
- `Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
30+
- Make sure you're first logged into your Azure subscription by running `az login`.
31+
32+
### Installation Instructions
33+
34+
1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
35+
36+
```bash
37+
conda create -n cloud-evaluation
38+
conda activate cloud-evaluation
39+
```
40+
41+
2. Install the required packages by running the following command:
42+
43+
```bash
44+
pip install azure-identity azure-ai-projects azure-ai-ml
45+
```
46+
47+
Optionally you can use `pip install azure-ai-evaluation` if you want a code-first experience to fetch evaluator ID for built-in evaluators in code.
48+
49+
Now you can define a client and a deployment which will be used to run your evaluations in the cloud:
50+
51+
```python
52+
53+
import os, time
54+
from azure.ai.projects import AIProjectClient
55+
from azure.identity import DefaultAzureCredential
56+
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
57+
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
58+
59+
# Load your Azure OpenAI config
60+
deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
61+
api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
62+
63+
# Create an Azure AI Client from a connection string. Available on Azure AI project Overview page.
64+
project_client = AIProjectClient.from_connection_string(
65+
credential=DefaultAzureCredential(),
66+
conn_str="<connection_string>"
67+
)
68+
```
69+
70+
## Uploading evaluation data
71+
72+
We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
73+
74+
1. **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result:
75+
76+
```python
77+
data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl")
78+
```
79+
80+
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
81+
82+
2. Given existing datasets uploaded to your Project:
83+
84+
- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
85+
86+
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format previously.
87+
88+
## Specifying evaluators from Evaluator library
89+
90+
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
91+
92+
### Specifying built-in evaluators
93+
94+
- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK:
95+
96+
```python
97+
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
98+
print("F1 Score evaluator id:", F1ScoreEvaluator.id)
99+
```
100+
101+
- **From UI**: Follows these steps to fetch evaluator IDs after they're registered to your project:
102+
- Select **Evaluation** tab in your Azure AI project;
103+
- Select Evaluator library;
104+
- Select your evaluators of choice by comparing the descriptions;
105+
- Copy its "Asset ID" which will be your evaluator ID, for example, `azureml://registries/azureml/models/Groundedness-Evaluator/versions/1`.
106+
107+
### Specifying custom evaluators
108+
109+
- For code-based custom evaluators, register them to your Azure AI project and fetch the evaluator IDs as in this example:
110+
111+
```python
112+
from azure.ai.ml import MLClient
113+
from azure.ai.ml.entities import Model
114+
from promptflow.client import PFClient
115+
116+
117+
# Define ml_client to register custom evaluator
118+
ml_client = MLClient(
119+
subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
120+
resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
121+
workspace_name=os.environ["AZURE_PROJECT_NAME"],
122+
credential=DefaultAzureCredential()
123+
)
124+
125+
126+
# Load evaluator from module
127+
from answer_len.answer_length import AnswerLengthEvaluator
128+
129+
# Then we convert it to evaluation flow and save it locally
130+
pf_client = PFClient()
131+
local_path = "answer_len_local"
132+
pf_client.flows.save(entry=AnswerLengthEvaluator, path=local_path)
133+
134+
# Specify evaluator name to appear in the Evaluator library
135+
evaluator_name = "AnswerLenEvaluator"
136+
137+
# Finally register the evaluator to the Evaluator library
138+
custom_evaluator = Model(
139+
path=local_path,
140+
name=evaluator_name,
141+
description="Evaluator calculating answer length.",
142+
)
143+
registered_evaluator = ml_client.evaluators.create_or_update(custom_evaluator)
144+
print("Registered evaluator id:", registered_evaluator.id)
145+
# Registered evaluators have versioning. You can always reference any version available.
146+
versioned_evaluator = ml_client.evaluators.get(evaluator_name, version=1)
147+
print("Versioned evaluator id:", registered_evaluator.id)
148+
```
149+
150+
After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project.
151+
152+
- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](./evaluate-sdk.md#prompt-based-evaluators):
153+
154+
```python
155+
# Import your prompt-based custom evaluator
156+
from friendliness.friend import FriendlinessEvaluator
157+
158+
# Define your deployment
159+
model_config = dict(
160+
azure_endpoint=os.environ.get("AZURE_ENDPOINT"),
161+
azure_deployment=os.environ.get("AZURE_DEPLOYMENT_NAME"),
162+
api_version=os.environ.get("AZURE_API_VERSION"),
163+
api_key=os.environ.get("AZURE_API_KEY"),
164+
type="azure_openai"
165+
)
166+
167+
# Define ml_client to register custom evaluator
168+
ml_client = MLClient(
169+
subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
170+
resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
171+
workspace_name=os.environ["AZURE_PROJECT_NAME"],
172+
credential=DefaultAzureCredential()
173+
)
174+
175+
# # Convert evaluator to evaluation flow and save it locally
176+
local_path = "friendliness_local"
177+
pf_client = PFClient()
178+
pf_client.flows.save(entry=FriendlinessEvaluator, path=local_path)
179+
180+
# Specify evaluator name to appear in the Evaluator library
181+
evaluator_name = "FriendlinessEvaluator"
182+
183+
# Register the evaluator to the Evaluator library
184+
custom_evaluator = Model(
185+
path=local_path,
186+
name=evaluator_name,
187+
description="prompt-based evaluator measuring response friendliness.",
188+
)
189+
registered_evaluator = ml_client.evaluators.create_or_update(custom_evaluator)
190+
print("Registered evaluator id:", registered_evaluator.id)
191+
# Registered evaluators have versioning. You can always reference any version available.
192+
versioned_evaluator = ml_client.evaluators.get(evaluator_name, version=1)
193+
print("Versioned evaluator id:", registered_evaluator.id)
194+
```
195+
196+
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
197+
198+
## Cloud evaluation (preview) with Azure AI Projects SDK
199+
200+
You can now submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
201+
202+
```python
203+
import os, time
204+
from azure.ai.projects import AIProjectClient
205+
from azure.identity import DefaultAzureCredential
206+
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
207+
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
208+
209+
# Load your Azure OpenAI config
210+
deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
211+
api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
212+
213+
# Create an Azure AI Client from a connection string. Avaiable on project overview page on Azure AI project UI.
214+
project_client = AIProjectClient.from_connection_string(
215+
credential=DefaultAzureCredential(),
216+
conn_str="<connection_string>"
217+
)
218+
219+
# Construct dataset ID per the instruction
220+
data_id = "<dataset-id>"
221+
222+
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
223+
224+
# Use the same model_config for your evaluator (or use different ones if needed)
225+
model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version)
226+
227+
# Create an evaluation
228+
evaluation = Evaluation(
229+
display_name="Cloud evaluation",
230+
description="Evaluation of dataset",
231+
data=Dataset(id=data_id),
232+
evaluators={
233+
# Note the evaluator configuration key must follow a naming convention
234+
# the string must start with a letter with only alphanumeric characters
235+
# and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator"
236+
# will also be acceptable, but "f1-score-eval" or "1score" will result in errors.
237+
"f1_score": EvaluatorConfiguration(
238+
id=F1ScoreEvaluator.id,
239+
),
240+
"relevance": EvaluatorConfiguration(
241+
id=RelevanceEvaluator.id,
242+
init_params={
243+
"model_config": model_config
244+
},
245+
),
246+
"violence": EvaluatorConfiguration(
247+
id=ViolenceEvaluator.id,
248+
init_params={
249+
"azure_ai_project": project_client.scope
250+
},
251+
),
252+
"friendliness": EvaluatorConfiguration(
253+
id="<custom_evaluator_id>",
254+
init_params={
255+
"model_config": model_config
256+
}
257+
)
258+
},
259+
)
260+
261+
# Create evaluation
262+
evaluation_response = project_client.evaluations.create(
263+
evaluation=evaluation,
264+
)
265+
266+
# Get evaluation
267+
get_evaluation_response = project_client.evaluations.get(evaluation_response.id)
268+
269+
print("----------------------------------------------------------------")
270+
print("Created evaluation, evaluation ID: ", get_evaluation_response.id)
271+
print("Evaluation status: ", get_evaluation_response.status)
272+
print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"])
273+
print("----------------------------------------------------------------")
274+
```
275+
Now you can use the URI to view your evaluation results in your Azure AI project, in order to better assess the quality and safety performance of your applications.
276+
277+
## Related content
278+
279+
- [Evaluate your Generative AI applications locally](./evaluate-sdk.md)
280+
- [Evaluate your Generative AI applications online](https://aka.ms/GenAIMonitoringDoc)
281+
- [Learn more about simulating test datasets for evaluation](./simulator-interaction-data.md)
282+
- [View your evaluation results in Azure AI project](../../how-to/evaluate-results.md)
283+
- [Get started building a chat app using the Azure AI Foundry SDK](../../quickstarts/get-started-code.md)
284+
- [Get started with evaluation samples](https://aka.ms/aistudio/eval-samples)

0 commit comments

Comments
 (0)