You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Evaluation with the prompt flow SDK has been retired and replaced with Azure AI Evaluation SDK.
22
+
> Evaluation with the prompt flow SDK has been retired and replaced with Azure AI Evaluation SDK client library for Python. See the [API Reference Documentation](https://aka.ms/azureaieval-python-ref) for more details including input data requirements.
23
23
24
24
To thoroughly assess the performance of your generative AI application when applied to a substantial dataset, you can evaluate a Generative AI application in your development environment with the Azure AI evaluation SDK. Given either a test dataset or a target, your generative AI application generations are quantitatively measured with both mathematical based metrics and AI-assisted quality and safety evaluators. Built-in or custom evaluators can provide you with comprehensive insights into the application's capabilities and limitations.
25
25
@@ -51,9 +51,6 @@ For more in-depth information on each evaluator definition and how it's calculat
51
51
52
52
Built-in quality and safety metrics take in query and response pairs, along with additional information for specific evaluators.
53
53
54
-
> [!TIP]
55
-
> For more information about inputs and outputs, see the [Azure Python reference documentation](https://aka.ms/azureaieval-python-ref).
56
-
57
54
### Data requirements for built-in evaluators
58
55
59
56
Built-in evaluators can accept *either* query and response pairs or a list of conversations:
@@ -214,9 +211,11 @@ You can use our built-in AI-assisted and NLP quality evaluators to assess the pe
214
211
215
212
#### Set up
216
213
217
-
1. For AI-assisted quality evaluators except for `GroundednessProEvaluator`, you must specify a GPT model to act as a judge to score the evaluation data. Choose a deployment with either GPT-3.5, GPT-4, GPT-4o or GPT-4-mini model for your calculations and set it as your `model_config`. We support both Azure OpenAI or OpenAI model configuration schema. We recommend using GPT models that don't have the `(preview)` suffix for the best performance and parseable responses with our evaluators.
214
+
1. For AI-assisted quality evaluators except for `GroundednessProEvaluator`, you must specify a GPT model (`gpt-35-turbo`, `gpt-4`, `gpt-4-turbo`, `gpt-4o` or `gpt-4o-mini`) in your `model_config` to act as a judge to score the evaluation data. We support both Azure OpenAI or OpenAI model configuration schema. We recommend using GPT models that don't have the `(preview)` suffix for the best performance and parseable responses with our evaluators.
218
215
219
216
> [!NOTE]
217
+
> It is strongly recommended that `gpt-3.5-turbo` should be replaced by `gpt-4o-mini` for your evaluator model, as the latter is cheaper, more capable, and just as fast according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
218
+
220
219
> Make sure the you have at least `Cognitive Services OpenAI User` role for the Azure OpenAI resource to make inference calls with API key. For more permissions, learn more about [permissioning for Azure OpenAI resource](../../../ai-services/openai/how-to/role-based-access-control.md#summary).
221
220
222
221
2. For `GroundednessProEvaluator`, instead of a GPT deployment in `model_config`, you must provide your `azure_ai_project` information. This accesses the backend evaluation service of your Azure AI project.
@@ -738,284 +737,14 @@ result = evaluate(
738
737
739
738
```
740
739
741
-
## Cloud evaluation (preview) on test datasets
742
-
743
-
After local evaluations of your generative AI applications, you might want to run evaluations in the cloud for pre-deployment testing, and [continuously evaluate](https://aka.ms/GenAIMonitoringDoc) your applications for post-deployment monitoring. Azure AI Projects SDK offers such capabilities via a Python APIand supports almost all of the features available in local evaluations. Follow the steps below to submit your evaluation to the cloud on your data using built-inor custom evaluators.
744
-
745
-
### Prerequisites
746
-
747
-
- Azure AI project in the same [regions](#region-support) as risk and safety evaluators (preview). If you don't have an existing project, follow the guide [How to create Azure AI project](../create-projects.md?tabs=ai-studio) to create one.
748
-
749
-
> [!NOTE]
750
-
> Cloud evaluations do not support `ContentSafetyEvaluator`, and`QAEvaluator`.
751
-
752
-
- Azure OpenAI Deployment withGPT model supporting `chat completion`, for example `gpt-4`.
753
-
-`Connection String`for Azure AI project to easily create `AIProjectClient`object. You can get the **Project connection string** under **Project details**from the project's **Overview** page.
754
-
- Make sure you're first logged into your Azure subscription by running `az login`.
755
-
756
-
### Installation Instructions
757
-
758
-
1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
759
-
760
-
```bash
761
-
conda create -n cloud-evaluation
762
-
conda activate cloud-evaluation
763
-
```
764
-
765
-
2. Install the required packages by running the following command:
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
805
-
806
-
2. Given existing datasets uploaded to your Project:
807
-
808
-
-**From SDK**: if you already know the dataset name you created, construct the dataset IDin this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
809
-
810
-
-**From UI**: If you don't know the dataset name, locate it under the **Data** tab of your Azure AI project and construct the dataset ID as in the format above.
811
-
812
-
### Specifying evaluators from Evaluator library
813
-
814
-
We provide a list of built-in evaluators registered in the [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project. You can also register custom evaluators and use them for Cloud evaluation. We provide two ways to specify registered evaluators:
815
-
816
-
#### Specifying built-in evaluators
817
-
818
-
-**From SDK**: Use built-in evaluator `id`property supported by `azure-ai-evaluation`SDK:
819
-
820
-
```python
821
-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
After registering your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab in your Azure AI project.
875
-
876
-
- For prompt-based custom evaluators, use this snippet to register them. For example, let's register our `FriendlinessEvaluator` built as described in [Prompt-based evaluators](#prompt-based-evaluators):
877
-
878
-
```python
879
-
# Import your prompt-based custom evaluator
880
-
from friendliness.friend import FriendlinessEvaluator
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
921
-
922
-
### Cloud evaluation (preview) with Azure AI Projects SDK
923
740
924
-
You can submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example to submit a cloud evaluation of your dataset using an NLP evaluator (F1 score), an AI-assisted quality evaluator (Relevance), a safety evaluator (Violence) and a custom evaluator. Putting it altogether:
925
-
926
-
```python
927
-
import os, time
928
-
from azure.ai.projects import AIProjectClient
929
-
from azure.identity import DefaultAzureCredential
930
-
from azure.ai.projects.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
931
-
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator
0 commit comments