Skip to content

Commit c4d6fdd

Browse files
authored
break local and cloud eval into two docs
1 parent bfe8d43 commit c4d6fdd

File tree

1 file changed

+7
-278
lines changed

1 file changed

+7
-278
lines changed

articles/ai-studio/how-to/develop/evaluate-sdk.md

Lines changed: 7 additions & 278 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ author: lgayhardt
1919
[!INCLUDE [feature-preview](../../includes/feature-preview.md)]
2020

2121
> [!NOTE]
22-
> Evaluation with the prompt flow SDK has been retired and replaced with Azure AI Evaluation SDK.
22+
> Evaluation with the prompt flow SDK has been retired and replaced with Azure AI Evaluation SDK client library for Python. See the [API Reference Documentation](https://aka.ms/azureaieval-python-ref) for more details including input data requirements.
2323
2424
To thoroughly assess the performance of your generative AI application when applied to a substantial dataset, you can evaluate a Generative AI application in your development environment with the Azure AI evaluation SDK. Given either a test dataset or a target, your generative AI application generations are quantitatively measured with both mathematical based metrics and AI-assisted quality and safety evaluators. Built-in or custom evaluators can provide you with comprehensive insights into the application's capabilities and limitations.
2525

@@ -51,9 +51,6 @@ For more in-depth information on each evaluator definition and how it's calculat
5151

5252
Built-in quality and safety metrics take in query and response pairs, along with additional information for specific evaluators.
5353

54-
> [!TIP]
55-
> For more information about inputs and outputs, see the [Azure Python reference documentation](https://aka.ms/azureaieval-python-ref).
56-
5754
### Data requirements for built-in evaluators
5855

5956
Built-in evaluators can accept *either* query and response pairs or a list of conversations:
@@ -214,9 +211,11 @@ You can use our built-in AI-assisted and NLP quality evaluators to assess the pe
214211

215212
#### Set up
216213

217-
1. For AI-assisted quality evaluators except for `GroundednessProEvaluator`, you must specify a GPT model to act as a judge to score the evaluation data. Choose a deployment with either GPT-3.5, GPT-4, GPT-4o or GPT-4-mini model for your calculations and set it as your `model_config`. We support both Azure OpenAI or OpenAI model configuration schema. We recommend using GPT models that don't have the `(preview)` suffix for the best performance and parseable responses with our evaluators.
214+
1. For AI-assisted quality evaluators except for `GroundednessProEvaluator`, you must specify a GPT model (`gpt-35-turbo`, `gpt-4`, `gpt-4-turbo`, `gpt-4o` or `gpt-4o-mini`) in your `model_config` to act as a judge to score the evaluation data. We support both Azure OpenAI or OpenAI model configuration schema. We recommend using GPT models that don't have the `(preview)` suffix for the best performance and parseable responses with our evaluators.
218215

219216
> [!NOTE]
217+
> It is strongly recommended that `gpt-3.5-turbo` should be replaced by `gpt-4o-mini` for your evaluator model, as the latter is cheaper, more capable, and just as fast according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
218+
220219
> Make sure the you have at least `Cognitive Services OpenAI User` role for the Azure OpenAI resource to make inference calls with API key. For more permissions, learn more about [permissioning for Azure OpenAI resource](../../../ai-services/openai/how-to/role-based-access-control.md#summary).
221220
222221
2. For `GroundednessProEvaluator`, instead of a GPT deployment in `model_config`, you must provide your `azure_ai_project` information. This accesses the backend evaluation service of your Azure AI project.
@@ -738,284 +737,14 @@ result = evaluate(
738737

739738
```
740739

741-
## Cloud evaluation (preview) on test datasets
742-
743-
After local evaluations of your generative AI applications, you might want to run evaluations in the cloud for pre-deployment testing, and [continuously evaluate](https://aka.ms/GenAIMonitoringDoc) your applications for post-deployment monitoring. Azure AI Projects SDK offers such capabilities via a Python API and supports almost all of the features available in local evaluations. Follow the steps below to submit your evaluation to the cloud on your data using built-in or custom evaluators.
744-
745-
### Prerequisites
746-
747-
- Azure AI project in the same [regions](#region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
748-
749-
> [!NOTE]
750-
> Cloud evaluations do not support `ContentSafetyEvaluator`, and `QAEvaluator`.
751-
752-
- Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
753-
- `Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
754-
- Make sure you're first logged into your Azure subscription by running `az login`.
755-
756-
### Installation Instructions
757-
758-
1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
759-
760-
```bash
761-
conda create -n cloud-evaluation
762-
conda activate cloud-evaluation
763-
```
764-
765-
2. Install the required packages by running the following command:
766-
767-
```bash
768-
pip install azure-identity azure-ai-projects azure-ai-ml
769-
```
770-
771-
Optionally you can `pip install azure-ai-evaluation` if you want a code-first experience to fetch evaluator ID for built-in evaluators in code.
772-
773-
Now you can define a client and a deployment which will be used to run your evaluations in the cloud:
774-
775-
```python
776-
777-
import os, time
778-
from azure.ai.projects import AIProjectClient
779-
from azure.identity import DefaultAzureCredential
780-
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
781-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
782-
783-
# Load your Azure OpenAI config
784-
deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
785-
api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
786-
787-
# Create an Azure AI Client from a connection string. Avaiable on Azure AI project Overview page.
788-
project_client = AIProjectClient.from_connection_string(
789-
credential=DefaultAzureCredential(),
790-
conn_str="<connection_string>"
791-
)
792-
```
793-
794-
### Uploading evaluation data
795-
796-
We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
797-
798-
1. **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result:
799-
800-
```python
801-
data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl")
802-
```
803-
804-
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
805-
806-
2. Given existing datasets uploaded to your Project:
807-
808-
- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
809-
810-
- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format above.
811-
812-
### Specifying evaluators from Evaluator library
813-
814-
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
815-
816-
#### Specifying built-in evaluators
817-
818-
- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK:
819-
820-
```python
821-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
822-
print("F1 Score evaluator id:", F1ScoreEvaluator.id)
823-
```
824-
825-
- **From UI**: Follows these steps to fetch evaluator ids after they're registered to your project:
826-
- Select **Evaluation** tab in your Azure AI project;
827-
- Select Evaluator library;
828-
- Select your evaluators of choice by comparing the descriptions;
829-
- Copy its "Asset ID" which will be your evaluator id, for example, `azureml://registries/azureml/models/Groundedness-Evaluator/versions/1`.
830-
831-
#### Specifying custom evaluators
832-
833-
- For code-based custom evaluators, register them to your Azure AI project and fetch the evaluator ids with the following:
834-
835-
```python
836-
from azure.ai.ml import MLClient
837-
from azure.ai.ml.entities import Model
838-
from promptflow.client import PFClient
839-
840-
841-
# Define ml_client to register custom evaluator
842-
ml_client = MLClient(
843-
subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
844-
resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
845-
workspace_name=os.environ["AZURE_PROJECT_NAME"],
846-
credential=DefaultAzureCredential()
847-
)
848-
849-
850-
# Load evaluator from module
851-
from answer_len.answer_length import AnswerLengthEvaluator
852-
853-
# Then we convert it to evaluation flow and save it locally
854-
pf_client = PFClient()
855-
local_path = "answer_len_local"
856-
pf_client.flows.save(entry=AnswerLengthEvaluator, path=local_path)
857-
858-
# Specify evaluator name to appear in the Evaluator library
859-
evaluator_name = "AnswerLenEvaluator"
860-
861-
# Finally register the evaluator to the Evaluator library
862-
custom_evaluator = Model(
863-
path=local_path,
864-
name=evaluator_name,
865-
description="Evaluator calculating answer length.",
866-
)
867-
registered_evaluator = ml_client.evaluators.create_or_update(custom_evaluator)
868-
print("Registered evaluator id:", registered_evaluator.id)
869-
# Registered evaluators have versioning. You can always reference any version available.
870-
versioned_evaluator = ml_client.evaluators.get(evaluator_name, version=1)
871-
print("Versioned evaluator id:", registered_evaluator.id)
872-
```
873-
874-
After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project.
875-
876-
- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](#prompt-based-evaluators):
877-
878-
```python
879-
# Import your prompt-based custom evaluator
880-
from friendliness.friend import FriendlinessEvaluator
881-
882-
# Define your deployment
883-
model_config = dict(
884-
azure_endpoint=os.environ.get("AZURE_ENDPOINT"),
885-
azure_deployment=os.environ.get("AZURE_DEPLOYMENT_NAME"),
886-
api_version=os.environ.get("AZURE_API_VERSION"),
887-
api_key=os.environ.get("AZURE_API_KEY"),
888-
type="azure_openai"
889-
)
890-
891-
# Define ml_client to register custom evaluator
892-
ml_client = MLClient(
893-
subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
894-
resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
895-
workspace_name=os.environ["AZURE_PROJECT_NAME"],
896-
credential=DefaultAzureCredential()
897-
)
898-
899-
# # Convert evaluator to evaluation flow and save it locally
900-
local_path = "friendliness_local"
901-
pf_client = PFClient()
902-
pf_client.flows.save(entry=FriendlinessEvaluator, path=local_path)
903-
904-
# Specify evaluator name to appear in the Evaluator library
905-
evaluator_name = "FriendlinessEvaluator"
906-
907-
# Register the evaluator to the Evaluator library
908-
custom_evaluator = Model(
909-
path=local_path,
910-
name=evaluator_name,
911-
description="prompt-based evaluator measuring response friendliness.",
912-
)
913-
registered_evaluator = ml_client.evaluators.create_or_update(custom_evaluator)
914-
print("Registered evaluator id:", registered_evaluator.id)
915-
# Registered evaluators have versioning. You can always reference any version available.
916-
versioned_evaluator = ml_client.evaluators.get(evaluator_name, version=1)
917-
print("Versioned evaluator id:", registered_evaluator.id)
918-
```
919-
920-
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
921-
922-
### Cloud evaluation (preview) with Azure AI Projects SDK
923740

924-
You can submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example to submit a cloud evaluation of your dataset using an NLP evaluator (F1 score), an AI-assisted quality evaluator (Relevance), a safety evaluator (Violence) and a custom evaluator. Putting it altogether:
925-
926-
```python
927-
import os, time
928-
from azure.ai.projects import AIProjectClient
929-
from azure.identity import DefaultAzureCredential
930-
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
931-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
932-
933-
# Load your Azure OpenAI config
934-
deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
935-
api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
936-
937-
# Create an Azure AI Client from a connection string. Avaiable on project overview page on Azure AI project UI.
938-
project_client = AIProjectClient.from_connection_string(
939-
credential=DefaultAzureCredential(),
940-
conn_str="<connection_string>"
941-
)
942-
943-
# Construct dataset ID per the instruction
944-
data_id = "<dataset-id>"
945-
946-
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
947-
948-
# Use the same model_config for your evaluator (or use different ones if needed)
949-
model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version)
950-
951-
# Create an evaluation
952-
evaluation = Evaluation(
953-
display_name="Cloud evaluation",
954-
description="Evaluation of dataset",
955-
data=Dataset(id=data_id),
956-
evaluators={
957-
# Note the evaluator configuration key must follow a naming convention
958-
# the string must start with a letter with only alphanumeric characters
959-
# and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator"
960-
# will also be acceptable, but "f1-score-eval" or "1score" will result in errors.
961-
"f1_score": EvaluatorConfiguration(
962-
id=F1ScoreEvaluator.id,
963-
),
964-
"relevance": EvaluatorConfiguration(
965-
id=RelevanceEvaluator.id,
966-
init_params={
967-
"model_config": model_config
968-
},
969-
),
970-
"violence": EvaluatorConfiguration(
971-
id=ViolenceEvaluator.id,
972-
init_params={
973-
"azure_ai_project": project_client.scope
974-
},
975-
),
976-
"friendliness": EvaluatorConfiguration(
977-
id="<custom_evaluator_id>",
978-
init_params={
979-
"model_config": model_config
980-
}
981-
)
982-
},
983-
)
984-
985-
# Create evaluation
986-
evaluation_response = project_client.evaluations.create(
987-
evaluation=evaluation,
988-
)
989-
990-
# Get evaluation
991-
get_evaluation_response = project_client.evaluations.get(evaluation_response.id)
992-
993-
print("----------------------------------------------------------------")
994-
print("Created evaluation, evaluation ID: ", get_evaluation_response.id)
995-
print("Evaluation status: ", get_evaluation_response.status)
996-
print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"])
997-
print("----------------------------------------------------------------")
998-
```
999-
1000-
Now we can run the cloud evaluation we just instantiated above.
1001-
1002-
```python
1003-
evaluation = client.evaluations.create(
1004-
evaluation=evaluation,
1005-
subscription_id=subscription_id,
1006-
resource_group_name=resource_group_name,
1007-
workspace_name=workspace_name,
1008-
headers={
1009-
"x-azureml-token": DefaultAzureCredential().get_token("https://ml.azure.com/.default").token,
1010-
}
1011-
)
1012-
```
1013741

1014742
## Related content
1015743

1016-
- [Azure Python reference documentation](https://aka.ms/azureaieval-python-ref)
1017-
- [Azure AI Evaluation SDK Troubleshooting guide](https://aka.ms/azureaieval-tsg)
744+
- [Azure AI Evaluation Python SDK client reference documentation](https://aka.ms/azureaieval-python-ref)
745+
- [Azure AI Evaluation SDK client Troubleshooting guide](https://aka.ms/azureaieval-tsg)
1018746
- [Learn more about the evaluation metrics](../../concepts/evaluation-metrics-built-in.md)
747+
- [Evaluate your Generative AI applications remotely on the cloud](./cloud-evaluation.md)
1019748
- [Learn more about simulating test datasets for evaluation](./simulator-interaction-data.md)
1020749
- [View your evaluation results in Azure AI project](../../how-to/evaluate-results.md)
1021750
- [Get started building a chat app using the Azure AI Foundry SDK](../../quickstarts/get-started-code.md)

0 commit comments

Comments
 (0)