You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/develop/evaluate-sdk.md
+17-20Lines changed: 17 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Evaluate your Generative AI application with the Azure AI Evaluation SDK
3
-
titleSuffix: Azure AI Studio
3
+
titleSuffix: Azure AI project
4
4
description: This article provides instructions on how to evaluate a Generative AI application with the Azure AI Evaluation SDK.
5
5
manager: scottpolly
6
6
ms.service: azure-ai-studio
@@ -22,7 +22,7 @@ author: lgayhardt
22
22
23
23
To thoroughly assess the performance of your generative AI application when applied to a substantial dataset, you can evaluate a Generative AI application in your development environment with the Azure AI evaluation SDK. Given either a test dataset or a target, your generative AI application generations are quantitatively measured with both mathematical based metrics and AI-assisted quality and safety evaluators. Built-in or custom evaluators can provide you with comprehensive insights into the application's capabilities and limitations.
24
24
25
-
In this article, you learn how to run evaluators on a single row of data, a larger test dataset on an application target with built-in evaluators using the Azure AI evaluation SDK both locally and remotely, then track the results and evaluation logs in Azure AI Studio.
25
+
In this article, you learn how to run evaluators on a single row of data, a larger test dataset on an application target with built-in evaluators using the Azure AI evaluation SDK both locally and remotely, then track the results and evaluation logs in Azure AI project.
26
26
27
27
## Getting started
28
28
@@ -182,7 +182,7 @@ Here's an example of the result:
182
182
183
183
### Risk and safety evaluators
184
184
185
-
When you use AI-assisted risk and safety metrics, a GPT model isn't required. Instead of `model_config`, provide your `azure_ai_project` information. This accesses the Azure AI Studio safety evaluations back-end service, which provisions a GPT model specific to harms evaluation that can generate content risk severity scores and reasoning to enable the safety evaluators.
185
+
When you use AI-assisted risk and safety metrics, a GPT model isn't required. Instead of `model_config`, provide your `azure_ai_project` information. This accesses the Azure AI project safety evaluations back-end service, which provisions a GPT model specific to harms evaluation that can generate content risk severity scores and reasoning to enable the safety evaluators.
186
186
187
187
#### Region support
188
188
@@ -428,15 +428,15 @@ result = evaluate(
428
428
}
429
429
}
430
430
},
431
-
# Optionally provide your AI Studio project information to track your evaluation results in your Azure AI Studio project
431
+
# Optionally provide your Azure AI project information to track your evaluation results in your Azure AI project
432
432
azure_ai_project= azure_ai_project,
433
433
# Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
434
434
output_path="./myevalresults.json"
435
435
)
436
436
```
437
437
438
438
> [!TIP]
439
-
> Get the contents of the `result.studio_url` property for a link to view your logged evaluation results in Azure AI Studio.
439
+
> Get the contents of the `result.studio_url` property for a link to view your logged evaluation results in your Azure AI project.
440
440
441
441
The evaluator outputs results in a dictionary which contains aggregate `metrics` and row-level data and metrics. An example of an output:
442
442
@@ -479,7 +479,7 @@ The evaluator outputs results in a dictionary which contains aggregate `metrics`
479
479
480
480
### Requirements for `evaluate()`
481
481
482
-
The `evaluate()` API has a few requirements for the data format that it accepts and how it handles evaluator parameter key names so that the charts in your AI Studio evaluation results show up properly.
482
+
The `evaluate()` API has a few requirements for the data format that it accepts and how it handles evaluator parameter key names so that the charts of the evaluation results in your Azure AI project show up properly.
483
483
484
484
#### Data format
485
485
@@ -496,7 +496,7 @@ The `evaluate()` API only accepts data in the JSONLines format. For all built-in
496
496
497
497
#### Evaluator parameter format
498
498
499
-
When passing in your built-in evaluators, it's important to specify the right keyword mapping in the `evaluators` parameter list. The following is the keyword mapping required for the results from your built-in evaluators to show up in the UI when logged to Azure AI Studio.
499
+
When passing in your built-in evaluators, it's important to specify the right keyword mapping in the `evaluators` parameter list. The following is the keyword mapping required for the results from your built-in evaluators to show up in the UI when logged to your Azure AI project.
500
500
501
501
| Evaluator | keyword param |
502
502
|---------------------------|-------------------|
@@ -568,14 +568,13 @@ result = evaluate(
568
568
After local evaluations of your generative AI applications, you may want to trigger remote evaluations for pre-deployment testing and even continuously evaluate your applications for post-deployment monitoring. Azure AI Project SDK offers such capabilities via a Python API and supports all of the features available in local evaluations. Follow the steps below to submit your remote evaluation on your data using built-in or custom evaluators.
569
569
570
570
> [!NOTE]
571
-
> Remote evaluations are only supported in the same [regions](#region-support) as AI-assisted risk and safety metrics.
571
+
> Remote evaluations are only supported in East US 2 and Sweden Central regions.
572
572
573
573
574
574
### Prerequisites
575
575
- Azure AI project in `EastUS2` region. If you do not have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
576
576
- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
577
577
-`Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
578
-

579
578
- Make sure you are first logged into your Azure subscription by running `az login`.
580
579
581
580
### Installation Instructions
@@ -618,15 +617,12 @@ We provide two ways to register your data in Azure AI project required for remot
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
621
-

622
620
623
621
2. Given existing datasets uploaded to your Project:
624
622
- **From SDK**: if you already know the dataset name you created, construct the dataset id in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
625
623
626
624
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset id as in the format above.
627
625
628
-

629
-
630
626
631
627
### Specifying evaluators from Evaluator library
632
628
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for remote evaluation. We provide two ways to specify registered evaluators:
After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project.
692
686
693
-
- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](#Prompt-based-evaluators):
687
+
- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](#prompt-based-evaluators):
After logging your custom evaluator to your AI Studio project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
732
+
733
+
734
+
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
739
735
740
736
741
737
### Remote evaluation with Azure AI Project SDK
@@ -753,7 +749,7 @@ from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEv
# Note the evaluator configuration key must follow a naming convention: it must start with a letter and contain only alphanumeric characters and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator" will also be acceptable, but "f1-score-eval" or "1score" will result in errors.
0 commit comments