You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While Azure AI Evaluation SDK client supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring.
20
+
While Azure AI Evaluation SDK supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring.
21
21
22
-
In this article, you learn how to run cloud evaluation (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](../../concepts/observability.md#what-are-evaluators) and your own [custom evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC.
22
+
In this article, you learn how to run evaluations in the cloud (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](../../concepts/observability.md#what-are-evaluators) and your own [custom evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC.
23
23
24
-
## Prerequisites
24
+
## Prerequisite set up steps for Azure AI Foundry Projects
25
25
26
-
- Azure AI project in the same [regions](../../concepts/evaluation-evaluators/risk-safety-evaluators.md#azure-ai-foundry-project-configuration-and-region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
26
+
- Azure AI Foundry project in the same supported [regions](../../concepts/evaluation-evaluators/risk-safety-evaluators.md#azure-ai-foundry-project-configuration-and-region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI Foundry project](../create-projects.md?tabs=ai-studio) to create one.
27
27
28
28
- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
29
-
-`Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
30
29
- Make sure you're first logged into your Azure subscription by running `az login`.
31
30
32
-
### Installation Instructions
31
+
If this is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few additional setup steps.
33
32
34
-
1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
33
+
1.[Create and connect your storage account](https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/microsoft/infrastructure-setup/01-connections/connection-storage-account.bicep) to your Azure AI Foundry project at the resource level. This bicep template provisions and connects a storage account to your Foundry project with key authentication.
34
+
2. Make sure the connected storage account has access to all projects.
35
+
3. If you connected your storage account with Microsoft Entra ID, make sure to give MSI (Microsoft Identity) permissions for Storage Blob Data Owner to both your account and Foundry project resource in Azure portal.
35
36
36
-
```bash
37
-
conda create -n cloud-evaluation
38
-
conda activate cloud-evaluation
39
-
```
37
+
### Getting started
40
38
41
-
2. Install the required packages by running the following command:
39
+
First, install Azure AI Foundry SDK's project client which runs the evaluations in the cloud
# Create the project client (Foundry project and credentials)
71
+
project_client = AIProjectClient(
72
+
endpoint=endpoint,
65
73
credential=DefaultAzureCredential(),
66
-
conn_str="<connection_string>"
67
74
)
68
75
```
69
76
70
77
## Uploading evaluation data
71
78
72
-
Prepare the data according to the [input data requirements forbuilt-in evaluators](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). For examplein text evaluation, prepare a `"./evaluate_test_data.jsonl"` file that contains single-turn data inputs like this:
73
-
```json
74
-
{"query":"What is the capital of France?","response":"Paris."}
75
-
{"query":"What atoms compose water?","response":"Hydrogen and oxygen."}
76
-
{"query":"What color is my shirt?","response":"Blue."}
77
-
```
78
-
or contains conversation data like this:
79
-
```json
80
-
{"conversation":
81
-
{
82
-
"messages": [
83
-
{
84
-
"content": "Which tent is the most waterproof?",
85
-
"role": "user"
86
-
},
87
-
{
88
-
"content": "The Alpine Explorer Tent is the most waterproof",
89
-
"role": "assistant",
90
-
"context": "From the our product list the alpine explorer tent is the most waterproof. The Adventure Dining Table has higher weight."
91
-
},
92
-
{
93
-
"content": "How much does it cost?",
94
-
"role": "user"
95
-
},
96
-
{
97
-
"content": "The Alpine Explorer Tent is $120.",
98
-
"role": "assistant",
99
-
"context": null
100
-
}
101
-
]
102
-
}
103
-
}
79
+
```python
80
+
# Upload a local jsonl file (skip if you already have a Dataset registered)
81
+
data_id = project_client.datasets.upload_file(
82
+
name=dataset_name,
83
+
version=dataset_version,
84
+
file_path="./evaluate_test_data.jsonl",
85
+
).id
104
86
```
105
87
106
-
To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
88
+
To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
107
89
108
90
To learn more about input data formats for evaluating agents, see [evaluating Azure AI agents](./agent-evaluate-sdk.md#evaluate-azure-ai-agents) and [evaluating other agents](./agent-evaluate-sdk.md#evaluating-other-agents).
109
-
110
91
111
-
We provide two ways to register your data in Azure AI project required forevaluationsin the cloud:
112
-
113
-
- Uploading new datasets to your Project:
114
-
115
-
- **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result.
- **From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
122
-
123
-
- Specifying existing datasets uploaded to your Project:
95
+
from azure.ai.projects.models import (
96
+
EvaluatorConfiguration,
97
+
EvaluatorIds,
98
+
)
124
99
125
-
- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format previously.
120
+
## Submit evaluation in the cloud
128
121
129
-
## Specifying evaluators from Evaluator library
122
+
Finally, submit the remote evaluation run:
130
123
131
-
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
124
+
```python
125
+
from azure.ai.projects.models import (
126
+
Evaluation,
127
+
InputDataset
128
+
)
132
129
133
-
### Specifying built-in evaluators
130
+
# Create an evaluation with the dataset and evaluators specified
131
+
evaluation = Evaluation(
132
+
display_name="Cloud evaluation",
133
+
description="Evaluation of dataset",
134
+
data=InputDataset(id=data_id),
135
+
evaluators=evaluators,
136
+
)
134
137
135
-
- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK:
After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project.
192
198
193
-
- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md#prompt-based-evaluators):
199
+
### Prompt-based custom evaluators
200
+
201
+
Follow the example to register a custom `FriendlinessEvaluator` built as described in [Prompt-based evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md#prompt-based-evaluators):
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
238
246
239
-
## Submit a cloud evaluation
240
-
241
-
Putting the previous code altogether, you can now submit a cloud evaluation with Azure AI Projects SDK client library via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
242
-
243
-
```python
244
-
import os, time
245
-
from azure.ai.projects import AIProjectClient
246
-
from azure.identity import DefaultAzureCredential
247
-
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
248
-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
Following the URI, you will be redirected to Foundry to view your evaluation results in your Azure AI project and debug your application. Using reason fields and pass/fail, you will be able to better assess the quality and safety performance of your applications. You can run and compare multiple runs to test for regression or improvements.
321
-
322
247
## Related content
323
248
324
249
-[Evaluate your Generative AI applications locally](./evaluate-sdk.md)
0 commit comments