You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/develop/evaluate-sdk.md
+39-31Lines changed: 39 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ author: lgayhardt
22
22
23
23
To thoroughly assess the performance of your generative AI application when applied to a substantial dataset, you can evaluate a Generative AI application in your development environment with the Azure AI evaluation SDK. Given either a test dataset or a target, your generative AI application generations are quantitatively measured with both mathematical based metrics and AI-assisted quality and safety evaluators. Built-in or custom evaluators can provide you with comprehensive insights into the application's capabilities and limitations.
24
24
25
-
In this article, you learn how to run evaluators on a single row of data, a larger test dataset on an application target with built-in evaluators using the Azure AI evaluation SDK both locally and remotely, then track the results and evaluation logs in Azure AI project.
25
+
In this article, you learn how to run evaluators on a single row of data, a larger test dataset on an application target with built-in evaluators using the Azure AI evaluation SDK both locally and remotely on the cloud, then track the results and evaluation logs in Azure AI project.
26
26
27
27
## Getting started
28
28
@@ -122,7 +122,7 @@ For evaluators that support conversations, you can provide `conversation` as inp
122
122
}
123
123
```
124
124
125
-
Our evaluators will understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`, and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
125
+
Our evaluators understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`, and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
126
126
127
127
> [!NOTE]
128
128
> Note that in the second turn, even if `context` is `null` or a missing key, it will be interpreted as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
@@ -386,7 +386,7 @@ print(answer_length)
386
386
387
387
The result:
388
388
389
-
```JSON
389
+
```python
390
390
{"answer_length":27}
391
391
```
392
392
@@ -484,25 +484,33 @@ print(friendliness_score)
484
484
485
485
Here's the result:
486
486
487
-
```JSON
487
+
```python
488
488
{
489
489
'score': 1,
490
490
'reason': 'The response is hostile and unapologetic, lacking warmth or approachability.'
491
491
}
492
492
```
493
493
494
-
## Batch evaluation on test datasets using `evaluate()`
494
+
## Local evaluation on test datasets using `evaluate()`
495
495
496
496
After you spot-check your built-in or custom evaluators on a single row of data, you can combine multiple evaluators with the `evaluate()` API on an entire test dataset.
497
497
498
-
Before running `evaluate()`, to ensure that you can enable logging and tracing to your Azure AI project, make sure you are first logged in by running `az login`.
499
498
500
-
Then install the following sub-package:
499
+
### Prerequisites
500
+
501
+
If you want to enable logging and tracing to your Azure AI project for evaluation results, follow these steps:
502
+
503
+
1. Make sure you're first logged in by running `az login`.
504
+
2. Install the following sub-package:
501
505
502
506
```python
503
507
pip install azure-ai-evaluation[remote]
504
508
```
509
+
3. Make sure you have the [Identity-based access](../secure-data-playground.md#prerequisites) setting for the storage account in your Azure AI hub. To find your storage, go to the Overview page of your Azure AI hub and select Storage.
510
+
511
+
4. Make sure you have `Storage Blob Data Contributor` role for the storage account.
505
512
513
+
### Local evaluation on datasets
506
514
In order to ensure the `evaluate()` can correctly parse the data, you must specify column mapping to map the column from the dataset to key words that are accepted by the evaluators. In this case, we specify the data mapping for `query`, `response`, and `context`.
507
515
508
516
```python
@@ -659,41 +667,41 @@ result = evaluate(
659
667
660
668
```
661
669
662
-
## Remote evaluation
670
+
## Cloud evaluation on test datasets
663
671
664
-
After local evaluations of your generative AI applications, you may want to trigger remote evaluations for pre-deployment testing and even [continuously evaluate](https://aka.ms/GenAIMonitoringDoc) your applications for post-deployment monitoring. Azure AI Project SDK offers such capabilities via a Python API and supports all of the features available in local evaluations. Follow the steps below to submit your remote evaluation on your data using built-in or custom evaluators.
672
+
After local evaluations of your generative AI applications, you may want to run evaluations in the cloud for pre-deployment testing, and [continuously evaluate](https://aka.ms/GenAIMonitoringDoc) your applications for post-deployment monitoring. Azure AI Projects SDK offers such capabilities via a Python API and supports almost all of the features available in local evaluations. Follow the steps below to submit your evaluation to the cloud on your data using built-in or custom evaluators.
665
673
666
674
667
675
### Prerequisites
668
676
- Azure AI project in the same [regions](#region-support) as risk and safety evaluators. If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
669
677
670
678
> [!NOTE]
671
-
> Remote evaluations do not support`Groundedness-Pro-Evaluator`, `Retrieval-Evaluator`, `Protected-Material-Evaluator`, `Indirect-Attack-Evaluator`,`ContentSafetyEvaluator`, and `QAEvaluator`.
679
+
> Cloud evaluations do not support `ContentSafetyEvaluator`, and `QAEvaluator`.
672
680
673
681
- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
674
682
-`Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
675
-
- Make sure you are first logged into your Azure subscription by running `az login`.
683
+
- Make sure you're first logged into your Azure subscription by running `az login`.
676
684
677
685
### Installation Instructions
678
686
679
687
1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
680
688
```bash
681
-
conda create -n remote-evaluation
682
-
conda activate remote-evaluation
689
+
conda create -n cloud-evaluation
690
+
conda activate cloud-evaluation
683
691
```
684
692
2. Install the required packages by running the following command:
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
717
725
718
726
2. Given existing datasets uploaded to your Project:
719
-
- **From SDK**: if you already know the dataset name you created, construct the dataset idin this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
727
+
- **From SDK**: if you already know the dataset name you created, construct the dataset IDin this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
720
728
721
-
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset id as in the format above.
729
+
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format above.
722
730
723
731
724
732
### Specifying evaluators from Evaluator library
725
-
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for remote evaluation. We provide two ways to specify registered evaluators:
733
+
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
726
734
727
735
#### Specifying built-in evaluators
728
736
- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK:
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
833
841
834
842
835
-
### Remote evaluation with Azure AI Project SDK
843
+
### Cloud evaluation with Azure AI Projects SDK
836
844
837
-
You can submit a remote evaluation with Azure AI Project SDK via a Python API. See the following example to submit a remote evaluation of your dataset using an NLP evaluator (F1 score), an AI-assisted quality evaluator (Relevance), a safety evaluator (Violence) and a custom evaluator. Putting it altogether:
845
+
You can submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example to submit a cloud evaluation of your dataset using an NLP evaluator (F1 score), an AI-assisted quality evaluator (Relevance), a safety evaluator (Violence) and a custom evaluator. Putting it altogether:
838
846
839
847
```python
840
848
import os, time
841
-
from azure.ai.project import AIProjectClient
849
+
from azure.ai.projects import AIProjectClient
842
850
from azure.identity import DefaultAzureCredential
843
-
from azure.ai.project.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
851
+
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
844
852
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
0 commit comments