Skip to content

Commit 906aef7

Browse files
authored
Merge pull request #1603 from changliu2/ignite2024
Ignite2024: remote eval section changes
2 parents f4ef9df + 8173954 commit 906aef7

File tree

1 file changed

+39
-31
lines changed

1 file changed

+39
-31
lines changed

articles/ai-studio/how-to/develop/evaluate-sdk.md

Lines changed: 39 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ author: lgayhardt
2222
2323
To thoroughly assess the performance of your generative AI application when applied to a substantial dataset, you can evaluate a Generative AI application in your development environment with the Azure AI evaluation SDK. Given either a test dataset or a target, your generative AI application generations are quantitatively measured with both mathematical based metrics and AI-assisted quality and safety evaluators. Built-in or custom evaluators can provide you with comprehensive insights into the application's capabilities and limitations.
2424

25-
In this article, you learn how to run evaluators on a single row of data, a larger test dataset on an application target with built-in evaluators using the Azure AI evaluation SDK both locally and remotely, then track the results and evaluation logs in Azure AI project.
25+
In this article, you learn how to run evaluators on a single row of data, a larger test dataset on an application target with built-in evaluators using the Azure AI evaluation SDK both locally and remotely on the cloud, then track the results and evaluation logs in Azure AI project.
2626

2727
## Getting started
2828

@@ -122,7 +122,7 @@ For evaluators that support conversations, you can provide `conversation` as inp
122122
}
123123
```
124124

125-
Our evaluators will understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`, and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
125+
Our evaluators understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`, and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
126126

127127
> [!NOTE]
128128
> Note that in the second turn, even if `context` is `null` or a missing key, it will be interpreted as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
@@ -386,7 +386,7 @@ print(answer_length)
386386

387387
The result:
388388

389-
```JSON
389+
```python
390390
{"answer_length":27}
391391
```
392392

@@ -484,25 +484,33 @@ print(friendliness_score)
484484

485485
Here's the result:
486486

487-
```JSON
487+
```python
488488
{
489489
'score': 1,
490490
'reason': 'The response is hostile and unapologetic, lacking warmth or approachability.'
491491
}
492492
```
493493

494-
## Batch evaluation on test datasets using `evaluate()`
494+
## Local evaluation on test datasets using `evaluate()`
495495

496496
After you spot-check your built-in or custom evaluators on a single row of data, you can combine multiple evaluators with the `evaluate()` API on an entire test dataset.
497497

498-
Before running `evaluate()`, to ensure that you can enable logging and tracing to your Azure AI project, make sure you are first logged in by running `az login`.
499498

500-
Then install the following sub-package:
499+
### Prerequisites
500+
501+
If you want to enable logging and tracing to your Azure AI project for evaluation results, follow these steps:
502+
503+
1. Make sure you're first logged in by running `az login`.
504+
2. Install the following sub-package:
501505

502506
```python
503507
pip install azure-ai-evaluation[remote]
504508
```
509+
3. Make sure you have the [Identity-based access](../secure-data-playground.md#prerequisites) setting for the storage account in your Azure AI hub. To find your storage, go to the Overview page of your Azure AI hub and select Storage.
510+
511+
4. Make sure you have `Storage Blob Data Contributor` role for the storage account.
505512

513+
### Local evaluation on datasets
506514
In order to ensure the `evaluate()` can correctly parse the data, you must specify column mapping to map the column from the dataset to key words that are accepted by the evaluators. In this case, we specify the data mapping for `query`, `response`, and `context`.
507515

508516
```python
@@ -659,41 +667,41 @@ result = evaluate(
659667

660668
```
661669

662-
## Remote evaluation
670+
## Cloud evaluation on test datasets
663671

664-
After local evaluations of your generative AI applications, you may want to trigger remote evaluations for pre-deployment testing and even [continuously evaluate](https://aka.ms/GenAIMonitoringDoc) your applications for post-deployment monitoring. Azure AI Project SDK offers such capabilities via a Python API and supports all of the features available in local evaluations. Follow the steps below to submit your remote evaluation on your data using built-in or custom evaluators.
672+
After local evaluations of your generative AI applications, you may want to run evaluations in the cloud for pre-deployment testing, and [continuously evaluate](https://aka.ms/GenAIMonitoringDoc) your applications for post-deployment monitoring. Azure AI Projects SDK offers such capabilities via a Python API and supports almost all of the features available in local evaluations. Follow the steps below to submit your evaluation to the cloud on your data using built-in or custom evaluators.
665673

666674

667675
### Prerequisites
668676
- Azure AI project in the same [regions](#region-support) as risk and safety evaluators. If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
669677

670678
> [!NOTE]
671-
> Remote evaluations do not support `Groundedness-Pro-Evaluator`, `Retrieval-Evaluator`, `Protected-Material-Evaluator`, `Indirect-Attack-Evaluator`, `ContentSafetyEvaluator`, and `QAEvaluator`.
679+
> Cloud evaluations do not support `ContentSafetyEvaluator`, and `QAEvaluator`.
672680
673681
- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
674682
- `Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
675-
- Make sure you are first logged into your Azure subscription by running `az login`.
683+
- Make sure you're first logged into your Azure subscription by running `az login`.
676684

677685
### Installation Instructions
678686

679687
1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
680688
```bash
681-
conda create -n remote-evaluation
682-
conda activate remote-evaluation
689+
conda create -n cloud-evaluation
690+
conda activate cloud-evaluation
683691
```
684692
2. Install the required packages by running the following command:
685693
```bash
686694
pip install azure-identity azure-ai-projects azure-ai-ml
687695
```
688-
Optionally you can `pip install azure-ai-evaluation` if you want a code-first experience to fetch evaluator id for built-in evaluators in code.
696+
Optionally you can `pip install azure-ai-evaluation` if you want a code-first experience to fetch evaluator ID for built-in evaluators in code.
689697

690-
Now you can define a client and a deployment which will be used to run your remote evaluations:
698+
Now you can define a client and a deployment which will be used to run your evaluations in the cloud:
691699
```python
692700
693701
import os, time
694-
from azure.ai.project import AIProjectClient
702+
from azure.ai.projects import AIProjectClient
695703
from azure.identity import DefaultAzureCredential
696-
from azure.ai.project.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
704+
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
697705
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
698706
699707
# Load your Azure OpenAI config
@@ -708,21 +716,21 @@ project_client = AIProjectClient.from_connection_string(
708716
```
709717

710718
### Uploading evaluation data
711-
We provide two ways to register your data in Azure AI project required for remote evaluations:
712-
1. **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset id as a result:
719+
We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
720+
1. **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result:
713721
```python
714722
data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl")
715723
```
716724
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
717725

718726
2. Given existing datasets uploaded to your Project:
719-
- **From SDK**: if you already know the dataset name you created, construct the dataset id in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
727+
- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
720728

721-
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset id as in the format above.
729+
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format above.
722730
723731
724732
### Specifying evaluators from Evaluator library
725-
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for remote evaluation. We provide two ways to specify registered evaluators:
733+
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
726734
727735
#### Specifying built-in evaluators
728736
- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK:
@@ -734,7 +742,7 @@ print("F1 Score evaluator id:", F1ScoreEvaluator.id)
734742
- **From UI**: Follows these steps to fetch evaluator ids after they're registered to your project:
735743
- Select **Evaluation** tab in your Azure AI project;
736744
- Select Evaluator library;
737-
- Select your evaluator(s) of choice by comparing the descriptions;
745+
- Select your evaluators of choice by comparing the descriptions;
738746
- Copy its "Asset ID" which will be your evaluator id, for example, `azureml://registries/azureml/models/Groundedness-Evaluator/versions/1`.
739747

740748
#### Specifying custom evaluators
@@ -832,15 +840,15 @@ print("Versioned evaluator id:", registered_evaluator.id)
832840
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
833841
834842
835-
### Remote evaluation with Azure AI Project SDK
843+
### Cloud evaluation with Azure AI Projects SDK
836844
837-
You can submit a remote evaluation with Azure AI Project SDK via a Python API. See the following example to submit a remote evaluation of your dataset using an NLP evaluator (F1 score), an AI-assisted quality evaluator (Relevance), a safety evaluator (Violence) and a custom evaluator. Putting it altogether:
845+
You can submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example to submit a cloud evaluation of your dataset using an NLP evaluator (F1 score), an AI-assisted quality evaluator (Relevance), a safety evaluator (Violence) and a custom evaluator. Putting it altogether:
838846
839847
```python
840848
import os, time
841-
from azure.ai.project import AIProjectClient
849+
from azure.ai.projects import AIProjectClient
842850
from azure.identity import DefaultAzureCredential
843-
from azure.ai.project.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
851+
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
844852
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
845853
846854
# Load your Azure OpenAI config
@@ -853,7 +861,7 @@ project_client = AIProjectClient.from_connection_string(
853861
conn_str="<connection_string>"
854862
)
855863
856-
# Construct dataset id per the instruction
864+
# Construct dataset ID per the instruction
857865
data_id = "<dataset-id>"
858866
859867
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
@@ -863,7 +871,7 @@ model_config = default_connection.to_evaluator_model_config(deployment_name=depl
863871
864872
# Create an evaluation
865873
evaluation = Evaluation(
866-
display_name="Remote Evaluation",
874+
display_name="Cloud evaluation",
867875
description="Evaluation of dataset",
868876
data=Dataset(id=data_id),
869877
evaluators={
@@ -910,7 +918,7 @@ print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluation
910918
print("----------------------------------------------------------------")
911919
```
912920
913-
Now we can run the evaluation we just instantiated above remotely.
921+
Now we can run the cloud evaluation we just instantiated above.
914922
915923
```python
916924
evaluation = client.evaluations.create(
@@ -933,4 +941,4 @@ evaluation = client.evaluations.create(
933941
- [Learn more about simulating test datasets for evaluation](./simulator-interaction-data.md)
934942
- [View your evaluation results in Azure AI project](../../how-to/evaluate-results.md)
935943
- [Get started building a chat app using the Azure AI Foundry SDK](../../quickstarts/get-started-code.md)
936-
- [Get started with evaluation samples](https://aka.ms/aistudio/eval-samples)
944+
- [Get started with evaluation samples](https://aka.ms/aistudio/eval-samples)

0 commit comments

Comments
 (0)