You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/how-to/develop/evaluate-sdk.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ pip install azure-ai-evaluation
32
32
```
33
33
34
34
> [!NOTE]
35
-
> For more detailed information, see the [API reference documentation for the Azure AI Evaluation SDK](https://aka.ms/azureaieval-python-ref).
35
+
> For more information, see [Azure AI Evaluation client library for Python](https://aka.ms/azureaieval-python-ref).
36
36
37
37
## Built-in evaluators
38
38
@@ -91,11 +91,11 @@ Built-in evaluators can accept query and response pairs, a list of conversations
91
91
92
92
93
93
> [!NOTE]
94
-
> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators, except that it will be 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
94
+
> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation is set to 800 for all AI-assisted evaluators, except that it has the value 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
95
95
96
96
Azure OpenAI graders require a template that describes how their input columns are turned into the *real* input that the grader uses. Example: If you have two inputs called *query* and *response*, and a template that was formatted as `{{item.query}}`, then only the query would be used. Similarly, you could have something like `{{item.conversation}}` to accept a conversation input, but the ability of the system to handle that depends on how you configure the rest of the grader to expect that input.
97
97
98
-
For more information on data requirements for agentic evaluators, go to [Run agent evaluations locally with the Azure AI Evaluation SDK](agent-evaluate-sdk.md).
98
+
For more information on data requirements for agentic evaluators, see [Evaluate your AI agents](agent-evaluate-sdk.md).
99
99
100
100
#### Single-turn support for text
101
101
@@ -127,7 +127,7 @@ The evaluation test dataset can contain the following, depending on the requirem
127
127
-**Context**: The source the generated response is based on (that is, the grounding documents).
128
128
-**Ground truth**: The response generated by a user or human as the true answer.
129
129
130
-
To see what each evaluator requires, you can learn more in the [built-in evaluators documents](/azure/ai-foundry/concepts/observability#what-are-evaluators).
130
+
To see what each evaluator requires, see [Evaluators](/azure/ai-foundry/concepts/observability#what-are-evaluators).
131
131
132
132
#### Conversation support for text
133
133
@@ -192,7 +192,7 @@ To run batch evaluations by using [local evaluation](#local-evaluation-on-test-d
192
192
Our evaluators understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`, and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
193
193
194
194
> [!NOTE]
195
-
> In the second turn, even if `context` is `null` or a missing key, it's interpreted as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
195
+
> In the second turn, even if `context` is `null` or a missing key, the evaluator interprets the turn as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
196
196
197
197
For conversation mode, here's an example for `GroundednessEvaluator`:
198
198
@@ -346,7 +346,7 @@ Currently the image and multi-modal evaluators support:
346
346
For AI-assisted quality evaluators (except for `GroundednessProEvaluator` preview), you must specify a GPT model (`gpt-35-turbo`, `gpt-4`, `gpt-4-turbo`, `gpt-4o`, or `gpt-4o-mini`) in your `model_config`. The GPT model acts as a judge to score the evaluation data. We support both Azure OpenAI or OpenAI model configuration schemas. For the best performance and parseable responses with our evaluators, we recommend using GPT models that aren't in preview.
347
347
348
348
> [!NOTE]
349
-
> We strongly recommend that you replace `gpt-3.5-turbo` with `gpt-4o-mini` for your evaluator model, because the latter is cheaper, more capable, and just as fast, according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
349
+
> We strongly recommend that you replace `gpt-3.5-turbo` with `gpt-4o-mini` for your evaluator model. The latter is cheaper, more capable, and as fast, according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
350
350
>
351
351
> Make sure that you have at least the `Cognitive Services OpenAI User` role for the Azure OpenAI resource to make inference calls with the API key. To learn more about permissions, see [Permissions for an Azure OpenAI resource](../../../ai-services/openai/how-to/role-based-access-control.md#summary).
352
352
@@ -371,7 +371,7 @@ After you spot-check your built-in or custom evaluators on a single row of data,
371
371
372
372
### Prerequisite set up steps for Azure AI Foundry projects
373
373
374
-
If this is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few additional setup steps:
374
+
If this session is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few other setup steps:
375
375
376
376
1.[Create and connect your storage account](https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/microsoft/infrastructure-setup/01-connections/connection-storage-account.bicep) to your Azure AI Foundry project at the resource level. This bicep template provisions and connects a storage account to your Foundry project with key authentication.
377
377
2. Make sure the connected storage account has access to all projects.
@@ -547,11 +547,11 @@ result = evaluate(
547
547
548
548
## Related content
549
549
550
-
-[Azure AI Evaluation Python SDK client reference documentation](https://aka.ms/azureaieval-python-ref)
551
-
-[Azure AI Evaluation SDK client troubleshooting guide](https://aka.ms/azureaieval-tsg)
552
-
-[Learn more about the evaluation metrics](../../concepts/evaluation-metrics-built-in.md)
553
-
-[Evaluate your generative AI applications remotely on the cloud](./cloud-evaluation.md)
554
-
-[Learn more about simulating test datasets for evaluation](./simulator-interaction-data.md)
555
-
-[View your evaluation results in an Azure AI project](../../how-to/evaluate-results.md)
556
-
-[Get started building a chat app by using the Azure AI Foundry SDK](../../quickstarts/get-started-code.md)
550
+
-[Azure AI Evaluation client library for Python](https://aka.ms/azureaieval-python-ref)
551
+
-[Troubleshoot AI Evaluation SDK Issues](https://aka.ms/azureaieval-tsg)
552
+
-[Observability in generative AI](../../concepts/evaluation-metrics-built-in.md)
553
+
-[Run evaluations in the cloud by using the Azure AI Foundry SDK](./cloud-evaluation.md)
554
+
-[Generate synthetic and simulated data for evaluation](./simulator-interaction-data.md)
555
+
-[See evaluation results in the Azure AI Foundry portal](../../how-to/evaluate-results.md)
556
+
-[Get started with Azure AI Foundry](../../quickstarts/get-started-code.md)
557
557
-[Get started with evaluation samples](https://aka.ms/aistudio/eval-samples)
0 commit comments