Merge pull request #5046 from lgayhardt/evalfixes052502

JamesJBarnett · web-flow · commit a8d5aa6828af · 2025-05-19T15:54:38.000-07:00
Build: cloud eval updates
diff --git a/articles/ai-foundry/how-to/develop/cloud-evaluation.md b/articles/ai-foundry/how-to/develop/cloud-evaluation.md
@@ -1,5 +1,5 @@
 ---
-title: Cloud evaluation with Azure AI Projects SDK
+title: Cloud evaluation with Azure AI Foundry SDK
 titleSuffix: Azure AI Foundry
 description: This article provides instructions on how to evaluate a Generative AI application on the cloud.
 manager: scottpolly
@@ -8,153 +8,160 @@ ms.custom:
   - references_regions
   - ignite-2024
 ms.topic: how-to
-ms.date: 02/21/2025
+ms.date: 05/19/2025
 ms.reviewer: changliu2
 ms.author: lagayhar
 author: lgayhardt
 ---
-# Evaluate your Generative AI application on the cloud with Azure AI Projects SDK (preview)
+# Run evaluations in the cloud using Azure AI Foundry SDK (preview)
 
 [!INCLUDE [feature-preview](../../includes/feature-preview.md)]
 
-While Azure AI Evaluation SDK client supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring.
+While Azure AI Evaluation SDK supports running evaluations locally on your own machine, you might want to delegate the job remotely to the cloud. For example, after you ran local evaluations on small test data to help assess your generative AI application prototypes, now you move into pre-deployment testing and need run evaluations on a large dataset. Cloud evaluation frees you from managing your local compute infrastructure, and enables you to integrate evaluations as tests into your CI/CD pipelines. After deployment, you might want to [continuously evaluate](../online-evaluation.md) your applications for post-deployment monitoring.
 
-In this article, you learn how to run cloud evaluation (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](../../concepts/observability.md#what-are-evaluators) and your own [custom evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC.
+In this article, you learn how to run evaluations in the cloud (preview) in pre-deployment testing on a test dataset. Using the Azure AI Projects SDK, you'll have evaluation results automatically logged into your Azure AI project for better observability. This feature supports all Microsoft curated [built-in evaluators](../../concepts/observability.md#what-are-evaluators) and your own [custom evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md) which can be located in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) and have the same project-scope RBAC.
 
-## Prerequisites
+## Prerequisite set up steps for Azure AI Foundry Projects
 
-- Azure AI project in the same [regions](../../concepts/evaluation-evaluators/risk-safety-evaluators.md#azure-ai-foundry-project-configuration-and-region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
+- Azure AI Foundry project in the same supported [regions](../../concepts/evaluation-evaluators/risk-safety-evaluators.md#azure-ai-foundry-project-configuration-and-region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI Foundry project](../create-projects.md?tabs=ai-studio) to create one.
 
 - Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
-- `Connection String` for Azure AI project to easily create `AIProjectClient` object. You can get the **Project connection string** under **Project details** from the project's **Overview** page.
 - Make sure you're first logged into your Azure subscription by running `az login`.
 
-### Installation Instructions
+If this is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few additional setup steps.
 
-1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
+1. [Create and connect your storage account](https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/microsoft/infrastructure-setup/01-connections/connection-storage-account.bicep) to your Azure AI Foundry project at the resource level. This bicep template provisions and connects a storage account to your Foundry project with key authentication.
+2. Make sure the connected storage account has access to all projects.
+3. If you connected your storage account with Microsoft Entra ID, make sure to give MSI (Microsoft Identity) permissions for Storage Blob Data Owner to both your account and Foundry project resource in Azure portal.
 
-    ```bash
-    conda create -n cloud-evaluation
-    conda activate cloud-evaluation
-    ```
+### Getting started
 
-2. Install the required packages by running the following command:
+First, install Azure AI Foundry SDK's project client which runs the evaluations in the cloud
 
-    ```bash
-   pip install azure-identity azure-ai-projects azure-ai-ml
-    ```
-
-    Optionally you can use `pip install azure-ai-evaluation` if you want a code-first experience to fetch evaluator ID for built-in evaluators in code.
+```python
+uv install azure-ai-projects azure-identity
+```
 
-Now you can define a client and a deployment which will be used to run your evaluations in the cloud:
+> [!NOTE]
+> For more detailed information, see the [REST API Reference Documentation](/rest/api/aifoundry/aiprojects/evaluations).
+Then, set your environment variables for your Azure AI Foundry resources
 
 ```python
+import os
 
-import os, time
-from azure.ai.projects import AIProjectClient
-from azure.identity import DefaultAzureCredential
-from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
-from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
+# Required environment variables
+endpoint = os.environ["PROJECT_ENDPOINT"] # https://<account>.services.ai.azure.com/api/projects/<project>
+model_endpoint = os.environ["MODEL_ENDPOINT"] # https://<account>.services.ai.azure.com
+model_api_key = os.environ["MODEL_API_KEY"]
+model_deployment_name = os.environ["MODEL_DEPLOYMENT_NAME"] # e.g. gpt-4o-mini
 
-# Load your Azure OpenAI config
-deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
-api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
+# Optional – reuse an existing dataset
+dataset_name    = os.environ.get("DATASET_NAME",    "dataset-test")
+dataset_version = os.environ.get("DATASET_VERSION", "1.0")
+```
+
+Now you can define a client which is used to run your evaluations in the cloud:
 
-# Create an Azure AI Client from a connection string. Available on Azure AI project Overview page.
-project_client = AIProjectClient.from_connection_string(
+```python
+import os
+from azure.identity import DefaultAzureCredential
+from azure.ai.projects import AIProjectClient
+
+# Create the project client (Foundry project and credentials)
+project_client = AIProjectClient(
+    endpoint=endpoint,
     credential=DefaultAzureCredential(),
-    conn_str="<connection_string>"
 )
 ```
 
 ## Uploading evaluation data
 
-Prepare the data according to the [input data requirements for built-in evaluators](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). For example in text evaluation, prepare a `"./evaluate_test_data.jsonl"` file that contains single-turn data inputs like this: 
-```json
-{"query":"What is the capital of France?","response":"Paris."}
-{"query":"What atoms compose water?","response":"Hydrogen and oxygen."}
-{"query":"What color is my shirt?","response":"Blue."}
-```
-or contains conversation data like this:
-```json
-{"conversation":
-    {
-        "messages": [
-        {
-            "content": "Which tent is the most waterproof?", 
-            "role": "user"
-        },
-        {
-            "content": "The Alpine Explorer Tent is the most waterproof",
-            "role": "assistant", 
-            "context": "From the our product list the alpine explorer tent is the most waterproof. The Adventure Dining Table has higher weight."
-        },
-        {
-            "content": "How much does it cost?",
-            "role": "user"
-        },
-        {
-            "content": "The Alpine Explorer Tent is $120.",
-            "role": "assistant",
-            "context": null
-        }
-        ]
-    }
-}
+```python
+# Upload a local jsonl file (skip if you already have a Dataset registered)
+data_id = project_client.datasets.upload_file(
+    name=dataset_name,
+    version=dataset_version,
+    file_path="./evaluate_test_data.jsonl",
+).id
 ```
 
-To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image). 
+To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
 
 To learn more about input data formats for evaluating agents, see [evaluating Azure AI agents](./agent-evaluate-sdk.md#evaluate-azure-ai-agents) and [evaluating other agents](./agent-evaluate-sdk.md#evaluating-other-agents).
- 
 
-We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
-
-- Uploading new datasets to your Project:
-
-- **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result. 
+## Specifying evaluators
 
 ```python
-data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl")
-```
-
-- **From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
-
-- Specifying existing datasets uploaded to your Project:
+from azure.ai.projects.models import (
+    EvaluatorConfiguration,
+    EvaluatorIds,
+)
 
-- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
+# Built-in evaluator configurations
+evaluators = {
+    "relevance": EvaluatorConfiguration(
+        id=EvaluatorIds.RELEVANCE.value,
+        init_params={"deployment_name": model_deployment_name},
+        data_mapping={
+            "query": "${data.query}",
+            "response": "${data.response}",
+        },
+    ),
+    "violence": EvaluatorConfiguration(
+        id=EvaluatorIds.VIOLENCE.value,
+        init_params={"azure_ai_project": endpoint},
+    ),
+    "bleu_score": EvaluatorConfiguration(
+        id=EvaluatorIds.BLEU_SCORE.value,
+    ),
+}
+```
 
-- **From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format previously.
+## Submit evaluation in the cloud
 
-## Specifying evaluators from Evaluator library
+Finally, submit the remote evaluation run:
 
-We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
+```python
+from azure.ai.projects.models import (
+    Evaluation,
+    InputDataset
+)
 
-### Specifying built-in evaluators
+# Create an evaluation with the dataset and evaluators specified
+evaluation = Evaluation(
+    display_name="Cloud evaluation",
+    description="Evaluation of dataset",
+    data=InputDataset(id=data_id),
+    evaluators=evaluators,
+)
 
-- **From SDK**: Use built-in evaluator `id` property supported by `azure-ai-evaluation` SDK:
+# Run the evaluation 
+evaluation_response = project_client.evaluations.create(
+    evaluation,
+    headers={
+        "model-endpoint": model_endpoint,
+        "api-key": model_api_key,
+    },
+)
 
-```python
-from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
-print("F1 Score evaluator id:", F1ScoreEvaluator.id)
+print("Created evaluation:", evaluation_response.name)
+print("Status:", evaluation_response.status)
 ```
 
-- **From UI**: Follows these steps to fetch evaluator IDs after they're registered to your project:
-  - Select **Evaluation** tab in your Azure AI project;
-  - Select Evaluator library;
-  - Select your evaluators of choice by comparing the descriptions;
-  - Copy its "Asset ID" which will be your evaluator ID, for example, `azureml://registries/azureml/models/Groundedness-Evaluator/versions/1`.
+## Specifying custom evaluators
+
+> [!NOTE]
+> Azure AI Foundry Projects aren't supported for this feature. Use an Azure AI Hub Project instead.
 
-### Specifying custom evaluators
+### Code-based custom evaluators
 
-- For code-based custom evaluators, register them to your Azure AI project and fetch the evaluator IDs as in this example:
+Register your custom evaluators to your Azure AI Hub project and fetch the evaluator IDs:
 
 ```python
 from azure.ai.ml import MLClient
 from azure.ai.ml.entities import Model
 from promptflow.client import PFClient
 
-
 # Define ml_client to register custom evaluator
 ml_client = MLClient(
        subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
@@ -163,7 +170,6 @@ ml_client = MLClient(
        credential=DefaultAzureCredential()
 )
 
-
 # Load evaluator from module
 from answer_len.answer_length import AnswerLengthEvaluator
 
@@ -190,7 +196,9 @@ print("Versioned evaluator id:", registered_evaluator.id)
 
 After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project.
 
-- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md#prompt-based-evaluators):
+### Prompt-based custom evaluators
+
+Follow the example to register a custom `FriendlinessEvaluator` built as described in [Prompt-based evaluators](../../concepts/evaluation-evaluators/custom-evaluators.md#prompt-based-evaluators):
 
 ```python
 # Import your prompt-based custom evaluator
@@ -236,89 +244,6 @@ print("Versioned evaluator id:", registered_evaluator.id)
 
 After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
 
-## Submit a cloud evaluation
-
-Putting the previous code altogether, you can now submit a cloud evaluation with Azure AI Projects SDK client library via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
-
-```python
-import os, time
-from azure.ai.projects import AIProjectClient
-from azure.identity import DefaultAzureCredential
-from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
-from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
-
-# Load your Azure OpenAI config
-deployment_name = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
-api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
-
-# Create an Azure AI Client from a connection string. Avaiable on project overview page on Azure AI project UI.
-project_client = AIProjectClient.from_connection_string(
-    credential=DefaultAzureCredential(),
-    conn_str="<connection_string>"
-)
-
-# Construct dataset ID per the instruction previously
-data_id = "<dataset-id>"
-
-default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
-
-# Use the same model_config for your evaluator (or use different ones if needed)
-model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version)
-
-# select the list of evaluators you care about
-evaluators = {
-        # Note the evaluator configuration key must follow a naming convention
-        # the string must start with a letter with only alphanumeric characters 
-        # and underscores. Take "f1_score" as example: "f1score" or "f1_evaluator" 
-        # will also be acceptable, but "f1-score-eval" or "1score" will result in errors.
-        "f1_score": EvaluatorConfiguration(
-            id=F1ScoreEvaluator.id,
-        ),
-        "relevance": EvaluatorConfiguration(
-            id=RelevanceEvaluator.id,
-            init_params={
-                "model_config": model_config
-            },
-        ),
-        "violence": EvaluatorConfiguration(
-            id=ViolenceEvaluator.id,
-            init_params={
-                "azure_ai_project": project_client.scope
-            },
-        ),
-        "friendliness": EvaluatorConfiguration(
-            id="<custom_evaluator_id>",
-            init_params={
-                "model_config": model_config
-            }
-        )
-}
-
-# Create an evaluation
-evaluation = Evaluation(
-    display_name="Cloud evaluation",
-    description="Evaluation of dataset",
-    data=Dataset(id=data_id),
-    evaluators=evaluators
-)
-
-# Create evaluation
-evaluation_response = project_client.evaluations.create(
-    evaluation=evaluation,
-)
-
-# Get evaluation result
-get_evaluation_response = project_client.evaluations.get(evaluation_response.id)
-
-print("----------------------------------------------------------------")
-print("Created evaluation, evaluation ID: ", get_evaluation_response.id)
-print("Evaluation status: ", get_evaluation_response.status)
-print("AI project URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"])
-print("----------------------------------------------------------------")
-```
-
-Following the URI, you will be redirected to Foundry to view your evaluation results in your Azure AI project and debug your application. Using reason fields and pass/fail, you will be able to better assess the quality and safety performance of your applications. You can run and compare multiple runs to test for regression or improvements.  
-
 ## Related content
 
 - [Evaluate your Generative AI applications locally](./evaluate-sdk.md)