Skip to content

Commit 2363aa1

Browse files
Additional edits.
1 parent 59725f4 commit 2363aa1

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ ms.custom:
1919

2020
[!INCLUDE [feature-preview](../../includes/feature-preview.md)]
2121

22-
You can thoroughly assess the performance of your generative AI application by applying it to a substantial dataset. Evaluate the application in your development environment with the Azure AI Evaluation SDK. When you provide either a test dataset or a target, your generative AI application outputs are quantitatively measured with both mathematical-based metrics and AI-assisted quality and safety evaluators. Built-in or custom evaluators can provide you with comprehensive insights into the application's capabilities and limitations.
22+
You can thoroughly assess the performance of your generative AI application by applying it to a substantial dataset. Evaluate the application in your development environment with the Azure AI Evaluation SDK.
23+
24+
When you provide either a test dataset or a target, your generative AI application outputs are quantitatively measured with both mathematical-based metrics and AI-assisted quality and safety evaluators. Built-in or custom evaluators can provide you with comprehensive insights into the application's capabilities and limitations.
2325

2426
In this article, you learn how to run evaluators on a single row of data and a larger test dataset on an application target. You use built-in evaluators that use the Azure AI Evaluation SDK locally. Then, you learn to track the results and evaluation logs in an Azure AI project.
2527

@@ -36,6 +38,8 @@ pip install azure-ai-evaluation
3638
3739
## Built-in evaluators
3840

41+
Built-in quality and safety metrics accept query and response pairs, along with additional information for specific evaluators.
42+
3943
| Category | Evaluators |
4044
|--------------------------|-----------------------------|
4145
| [General purpose](../../concepts/evaluation-evaluators/general-purpose-evaluators.md) | `CoherenceEvaluator`, `FluencyEvaluator`, `QAEvaluator` |
@@ -45,8 +49,6 @@ pip install azure-ai-evaluation
4549
| [Agentic](../../concepts/evaluation-evaluators/agent-evaluators.md) | `IntentResolutionEvaluator`, `ToolCallAccuracyEvaluator`, `TaskAdherenceEvaluator` |
4650
| [Azure OpenAI](../../concepts/evaluation-evaluators/azure-openai-graders.md) | `AzureOpenAILabelGrader`, `AzureOpenAIStringCheckGrader`, `AzureOpenAITextSimilarityGrader`, `AzureOpenAIGrader` |
4751

48-
Built-in quality and safety metrics accept query and response pairs, along with additional information for specific evaluators.
49-
5052
### Data requirements for built-in evaluators
5153

5254
Built-in evaluators can accept query and response pairs, a list of conversations in JSON Lines (JSONL) format, or both.
@@ -194,7 +196,9 @@ To run batch evaluations by using [local evaluation](#local-evaluation-on-test-d
194196
Our evaluators understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`, and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
195197

196198
> [!NOTE]
197-
> In the second turn, even if `context` is `null` or a missing key, the evaluator interprets the turn as an empty string instead of failing with an error, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
199+
> In the second turn, even if `context` is `null` or a missing key, the evaluator interprets the turn as an empty string instead of failing with an error, which might lead to misleading results.
200+
>
201+
> We strongly recommend that you validate your evaluation data to comply with the data requirements.
198202
199203
For conversation mode, here's an example for `GroundednessEvaluator`:
200204

@@ -360,7 +364,7 @@ We open-source the prompts of our quality evaluators in our Evaluator Library an
360364

361365
### Composite evaluators
362366

363-
Composite evaluators are built-in evaluators that combine individual quality or safety metrics. They easily provide a wide range of metrics right out of the box for both query response pairs or chat messages.
367+
Composite evaluators are built-in evaluators that combine individual quality or safety metrics. They provide a wide range of metrics right out of the box for both query response pairs or chat messages.
364368

365369
| Composite evaluator | Contains | Description |
366370
|--|--|--|
@@ -373,11 +377,11 @@ After you spot-check your built-in or custom evaluators on a single row of data,
373377

374378
### Prerequisite set up steps for Azure AI Foundry projects
375379

376-
If this session is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few other setup steps:
380+
If this session is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do the following setup steps:
377381

378382
1. [Create and connect your storage account](https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/microsoft/infrastructure-setup/01-connections/connection-storage-account.bicep) to your Azure AI Foundry project at the resource level. This bicep template provisions and connects a storage account to your Foundry project with key authentication.
379383
1. Make sure the connected storage account has access to all projects.
380-
1. If you connected your storage account with Microsoft Entra ID, make sure to give MSI (Microsoft Identity) permissions for **Storage Blob Data Owner** to both your account and Foundry project resource in Azure portal.
384+
1. If you connected your storage account with Microsoft Entra ID, make sure to give Microsoft Identity permissions for **Storage Blob Data Owner** to both your account and Foundry project resource in Azure portal.
381385

382386
### Evaluate on a dataset and log results to Azure AI Foundry
383387

@@ -412,7 +416,7 @@ result = evaluate(
412416
> [!TIP]
413417
> Get the contents of the `result.studio_url` property for a link to view your logged evaluation results in your Azure AI project.
414418
415-
The evaluator outputs results in a dictionary, which contains aggregate `metrics` and row-level data and metrics. See the following example of an output:
419+
The evaluator outputs results in a dictionary, which contains aggregate `metrics` and row-level data and metrics. See the following example output:
416420

417421
```python
418422
{'metrics': {'answer_length.value': 49.333333333333336,
@@ -515,7 +519,7 @@ result = evaluate(
515519

516520
If you have a list of queries that you want to run and then evaluate, the `evaluate()` API also supports a `target` parameter. This parameter can send queries to an application to collect answers, and then run your evaluators on the resulting query and response.
517521

518-
A target can be any callable class in your directory. In this example, there's a Python script `askwiki.py` with a callable class `askwiki()` that is set as our target. If you have a dataset of queries that you can send into the simple `askwiki` app, you can evaluate the groundedness of the outputs. Make sure that you specify the proper column mapping for your data in `"column_mapping"`. You can use `"default"` to specify column mapping for all evaluators.
522+
A target can be any callable class in your directory. In this example, there's a Python script `askwiki.py` with a callable class `askwiki()` that is set as the target. If you have a dataset of queries that you can send into the simple `askwiki` app, you can evaluate the groundedness of the outputs. Make sure that you specify the proper column mapping for your data in `"column_mapping"`. You can use `"default"` to specify column mapping for all evaluators.
519523

520524
Here's the content in `"data.jsonl"`:
521525

0 commit comments

Comments
 (0)