|
| 1 | +--- |
| 2 | +title: Cloud evaluation with Azure AI Projects SDK |
| 3 | +titleSuffix: Azure AI Foundry |
| 4 | +description: This article provides instructions on how to evaluate a Generative AI application on the cloud. |
| 5 | +manager: scottpolly |
| 6 | +ms.service: azure-ai-foundry |
| 7 | +ms.custom: |
| 8 | + - references_regions |
| 9 | + - ignite-2024 |
| 10 | +ms.topic: how-to |
| 11 | +ms.date: 02/21/2025 |
| 12 | +ms.reviewer: changliu2 |
| 13 | +ms.author: lagayhar |
| 14 | +author: lgayhardt |
| 15 | +--- |
| 16 | +# Evaluate your Generative AI application on the cloud with Azure AI Projects SDK (preview) |
| 17 | + |
| 18 | +[!INCLUDE [feature-preview](../../includes/feature-preview.md)] |
| 19 | + |
| 20 | +While Azure AI Evaluation SDK client supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring. |
| 21 | + |
| 22 | +In this article, you learn how to run cloud evaluation (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](./evaluate-sdk.md#built-in-evaluators) and your own [custom evaluators](./evaluate-sdk.md#custom-evaluators) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC. |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +- Azure AI project in the same [regions](./evaluate-sdk.md#region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one. |
| 27 | + |
| 28 | +- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`. |
| 29 | +- `Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page. |
| 30 | +- Make sure you're first logged into your Azure subscription by running `az login`. |
| 31 | + |
| 32 | +### Installation Instructions |
| 33 | + |
| 34 | +1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command: |
| 35 | + |
| 36 | + ```bash |
| 37 | + conda create -n cloud-evaluation |
| 38 | + conda activate cloud-evaluation |
| 39 | + ``` |
| 40 | + |
| 41 | +2. Install the required packages by running the following command: |
| 42 | + |
| 43 | + ```bash |
| 44 | + pip install azure-identity azure-ai-projects azure-ai-ml |
| 45 | + ``` |
| 46 | + |
| 47 | + Optionally you can use `pip install azure-ai-evaluation` if you want a code-first experience to fetch evaluator ID for built-in evaluators in code. |
| 48 | + |
| 49 | +Now you can define a client and a deployment which will be used to run your evaluations in the cloud: |
| 50 | + |
| 51 | +```python |
| 52 | +
|
| 53 | +import os, time |
| 54 | +from azure.ai.projects import AIProjectClient |
| 55 | +from azure.identity import DefaultAzureCredential |
| 56 | +from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType |
| 57 | +from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator |
| 58 | +
|
| 59 | +# Load your Azure OpenAI config |
| 60 | +deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT") |
| 61 | +api_version = os.environ.get("AZURE_OPENAI_API_VERSION") |
| 62 | +
|
| 63 | +# Create an Azure AI Client from a connection string. Available on Azure AI project Overview page. |
| 64 | +project_client = AIProjectClient.from_connection_string( |
| 65 | + credential=DefaultAzureCredential(), |
| 66 | + conn_str="<connection_string>" |
| 67 | +) |
| 68 | +``` |
| 69 | + |
| 70 | +## Uploading evaluation data |
| 71 | + |
| 72 | +We provide two ways to register your data in Azure AI project required for evaluations in the cloud: |
| 73 | + |
| 74 | +1. **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result: |
| 75 | + |
| 76 | +```python |
| 77 | +data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl") |
| 78 | +``` |
| 79 | + |
| 80 | +**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project. |
| 81 | + |
| 82 | +2. Given existing datasets uploaded to your Project: |
| 83 | + |
| 84 | +- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>` |
| 85 | + |
| 86 | +- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format previously. |
| 87 | +
|
| 88 | +## Specifying evaluators from Evaluator library |
| 89 | +
|
| 90 | +We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators: |
| 91 | +
|
| 92 | +### Specifying built-in evaluators |
| 93 | +
|
| 94 | +- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK: |
| 95 | +
|
| 96 | +```python |
| 97 | +from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator |
| 98 | +print("F1 Score evaluator id:", F1ScoreEvaluator.id) |
| 99 | +``` |
| 100 | +
|
| 101 | +- **From UI**: Follows these steps to fetch evaluator IDs after they're registered to your project: |
| 102 | + - Select **Evaluation** tab in your Azure AI project; |
| 103 | + - Select Evaluator library; |
| 104 | + - Select your evaluators of choice by comparing the descriptions; |
| 105 | + - Copy its "Asset ID" which will be your evaluator ID, for example, `azureml://registries/azureml/models/Groundedness-Evaluator/versions/1`. |
| 106 | + |
| 107 | +### Specifying custom evaluators |
| 108 | + |
| 109 | +- For code-based custom evaluators, register them to your Azure AI project and fetch the evaluator IDs as in this example: |
| 110 | + |
| 111 | +```python |
| 112 | +from azure.ai.ml import MLClient |
| 113 | +from azure.ai.ml.entities import Model |
| 114 | +from promptflow.client import PFClient |
| 115 | +
|
| 116 | +
|
| 117 | +# Define ml_client to register custom evaluator |
| 118 | +ml_client = MLClient( |
| 119 | + subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"], |
| 120 | + resource_group_name=os.environ["AZURE_RESOURCE_GROUP"], |
| 121 | + workspace_name=os.environ["AZURE_PROJECT_NAME"], |
| 122 | + credential=DefaultAzureCredential() |
| 123 | +) |
| 124 | +
|
| 125 | +
|
| 126 | +# Load evaluator from module |
| 127 | +from answer_len.answer_length import AnswerLengthEvaluator |
| 128 | +
|
| 129 | +# Then we convert it to evaluation flow and save it locally |
| 130 | +pf_client = PFClient() |
| 131 | +local_path = "answer_len_local" |
| 132 | +pf_client.flows.save(entry=AnswerLengthEvaluator, path=local_path) |
| 133 | +
|
| 134 | +# Specify evaluator name to appear in the Evaluator library |
| 135 | +evaluator_name = "AnswerLenEvaluator" |
| 136 | +
|
| 137 | +# Finally register the evaluator to the Evaluator library |
| 138 | +custom_evaluator = Model( |
| 139 | + path=local_path, |
| 140 | + name=evaluator_name, |
| 141 | + description="Evaluator calculating answer length.", |
| 142 | +) |
| 143 | +registered_evaluator = ml_client.evaluators.create_or_update(custom_evaluator) |
| 144 | +print("Registered evaluator id:", registered_evaluator.id) |
| 145 | +# Registered evaluators have versioning. You can always reference any version available. |
| 146 | +versioned_evaluator = ml_client.evaluators.get(evaluator_name, version=1) |
| 147 | +print("Versioned evaluator id:", registered_evaluator.id) |
| 148 | +``` |
| 149 | + |
| 150 | +After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project. |
| 151 | + |
| 152 | +- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](./evaluate-sdk.md#prompt-based-evaluators): |
| 153 | +
|
| 154 | +```python |
| 155 | +# Import your prompt-based custom evaluator |
| 156 | +from friendliness.friend import FriendlinessEvaluator |
| 157 | +
|
| 158 | +# Define your deployment |
| 159 | +model_config = dict( |
| 160 | + azure_endpoint=os.environ.get("AZURE_ENDPOINT"), |
| 161 | + azure_deployment=os.environ.get("AZURE_DEPLOYMENT_NAME"), |
| 162 | + api_version=os.environ.get("AZURE_API_VERSION"), |
| 163 | + api_key=os.environ.get("AZURE_API_KEY"), |
| 164 | + type="azure_openai" |
| 165 | +) |
| 166 | +
|
| 167 | +# Define ml_client to register custom evaluator |
| 168 | +ml_client = MLClient( |
| 169 | + subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"], |
| 170 | + resource_group_name=os.environ["AZURE_RESOURCE_GROUP"], |
| 171 | + workspace_name=os.environ["AZURE_PROJECT_NAME"], |
| 172 | + credential=DefaultAzureCredential() |
| 173 | +) |
| 174 | +
|
| 175 | +# # Convert evaluator to evaluation flow and save it locally |
| 176 | +local_path = "friendliness_local" |
| 177 | +pf_client = PFClient() |
| 178 | +pf_client.flows.save(entry=FriendlinessEvaluator, path=local_path) |
| 179 | +
|
| 180 | +# Specify evaluator name to appear in the Evaluator library |
| 181 | +evaluator_name = "FriendlinessEvaluator" |
| 182 | +
|
| 183 | +# Register the evaluator to the Evaluator library |
| 184 | +custom_evaluator = Model( |
| 185 | + path=local_path, |
| 186 | + name=evaluator_name, |
| 187 | + description="prompt-based evaluator measuring response friendliness.", |
| 188 | +) |
| 189 | +registered_evaluator = ml_client.evaluators.create_or_update(custom_evaluator) |
| 190 | +print("Registered evaluator id:", registered_evaluator.id) |
| 191 | +# Registered evaluators have versioning. You can always reference any version available. |
| 192 | +versioned_evaluator = ml_client.evaluators.get(evaluator_name, version=1) |
| 193 | +print("Versioned evaluator id:", registered_evaluator.id) |
| 194 | +``` |
| 195 | +
|
| 196 | +After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. |
| 197 | +
|
| 198 | +## Cloud evaluation (preview) with Azure AI Projects SDK |
| 199 | +
|
| 200 | +You can now submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library): |
| 201 | +
|
| 202 | +```python |
| 203 | +import os, time |
| 204 | +from azure.ai.projects import AIProjectClient |
| 205 | +from azure.identity import DefaultAzureCredential |
| 206 | +from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType |
| 207 | +from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator |
| 208 | +
|
| 209 | +# Load your Azure OpenAI config |
| 210 | +deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT") |
| 211 | +api_version = os.environ.get("AZURE_OPENAI_API_VERSION") |
| 212 | +
|
| 213 | +# Create an Azure AI Client from a connection string. Avaiable on project overview page on Azure AI project UI. |
| 214 | +project_client = AIProjectClient.from_connection_string( |
| 215 | + credential=DefaultAzureCredential(), |
| 216 | + conn_str="<connection_string>" |
| 217 | +) |
| 218 | +
|
| 219 | +# Construct dataset ID per the instruction |
| 220 | +data_id = "<dataset-id>" |
| 221 | +
|
| 222 | +default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI) |
| 223 | +
|
| 224 | +# Use the same model_config for your evaluator (or use different ones if needed) |
| 225 | +model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version) |
| 226 | +
|
| 227 | +# Create an evaluation |
| 228 | +evaluation = Evaluation( |
| 229 | + display_name="Cloud evaluation", |
| 230 | + description="Evaluation of dataset", |
| 231 | + data=Dataset(id=data_id), |
| 232 | + evaluators={ |
| 233 | + # Note the evaluator configuration key must follow a naming convention |
| 234 | + # the string must start with a letter with only alphanumeric characters |
| 235 | + # and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator" |
| 236 | + # will also be acceptable, but "f1-score-eval" or "1score" will result in errors. |
| 237 | + "f1_score": EvaluatorConfiguration( |
| 238 | + id=F1ScoreEvaluator.id, |
| 239 | + ), |
| 240 | + "relevance": EvaluatorConfiguration( |
| 241 | + id=RelevanceEvaluator.id, |
| 242 | + init_params={ |
| 243 | + "model_config": model_config |
| 244 | + }, |
| 245 | + ), |
| 246 | + "violence": EvaluatorConfiguration( |
| 247 | + id=ViolenceEvaluator.id, |
| 248 | + init_params={ |
| 249 | + "azure_ai_project": project_client.scope |
| 250 | + }, |
| 251 | + ), |
| 252 | + "friendliness": EvaluatorConfiguration( |
| 253 | + id="<custom_evaluator_id>", |
| 254 | + init_params={ |
| 255 | + "model_config": model_config |
| 256 | + } |
| 257 | + ) |
| 258 | + }, |
| 259 | +) |
| 260 | +
|
| 261 | +# Create evaluation |
| 262 | +evaluation_response = project_client.evaluations.create( |
| 263 | + evaluation=evaluation, |
| 264 | +) |
| 265 | +
|
| 266 | +# Get evaluation |
| 267 | +get_evaluation_response = project_client.evaluations.get(evaluation_response.id) |
| 268 | +
|
| 269 | +print("----------------------------------------------------------------") |
| 270 | +print("Created evaluation, evaluation ID: ", get_evaluation_response.id) |
| 271 | +print("Evaluation status: ", get_evaluation_response.status) |
| 272 | +print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"]) |
| 273 | +print("----------------------------------------------------------------") |
| 274 | +``` |
| 275 | +Now you can use the URI to view your evaluation results in your Azure AI project, in order to better assess the quality and safety performance of your applications. |
| 276 | +
|
| 277 | +## Related content |
| 278 | +
|
| 279 | +- [Evaluate your Generative AI applications locally](./evaluate-sdk.md) |
| 280 | +- [Evaluate your Generative AI applications online](https://aka.ms/GenAIMonitoringDoc) |
| 281 | +- [Learn more about simulating test datasets for evaluation](./simulator-interaction-data.md) |
| 282 | +- [View your evaluation results in Azure AI project](../../how-to/evaluate-results.md) |
| 283 | +- [Get started building a chat app using the Azure AI Foundry SDK](../../quickstarts/get-started-code.md) |
| 284 | +- [Get started with evaluation samples](https://aka.ms/aistudio/eval-samples) |
0 commit comments