Skip to content

Commit ff4df10

Browse files
Freshness, in progress.
1 parent 4a55643 commit ff4df10

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ pip install azure-ai-evaluation
3232
```
3333

3434
> [!NOTE]
35-
> For more detailed information, see the [API reference documentation for the Azure AI Evaluation SDK](https://aka.ms/azureaieval-python-ref).
35+
> For more information, see [Azure AI Evaluation client library for Python](https://aka.ms/azureaieval-python-ref).
3636
3737
## Built-in evaluators
3838

@@ -91,11 +91,11 @@ Built-in evaluators can accept query and response pairs, a list of conversations
9191

9292

9393
> [!NOTE]
94-
> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators, except that it will be 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
94+
> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation is set to 800 for all AI-assisted evaluators, except that it has the value 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
9595
9696
Azure OpenAI graders require a template that describes how their input columns are turned into the *real* input that the grader uses. Example: If you have two inputs called *query* and *response*, and a template that was formatted as `{{item.query}}`, then only the query would be used. Similarly, you could have something like `{{item.conversation}}` to accept a conversation input, but the ability of the system to handle that depends on how you configure the rest of the grader to expect that input.
9797

98-
For more information on data requirements for agentic evaluators, go to [Run agent evaluations locally with the Azure AI Evaluation SDK](agent-evaluate-sdk.md).
98+
For more information on data requirements for agentic evaluators, see [Evaluate your AI agents](agent-evaluate-sdk.md).
9999

100100
#### Single-turn support for text
101101

@@ -127,7 +127,7 @@ The evaluation test dataset can contain the following, depending on the requirem
127127
- **Context**: The source the generated response is based on (that is, the grounding documents).
128128
- **Ground truth**: The response generated by a user or human as the true answer.
129129

130-
To see what each evaluator requires, you can learn more in the [built-in evaluators documents](/azure/ai-foundry/concepts/observability#what-are-evaluators).
130+
To see what each evaluator requires, see [Evaluators](/azure/ai-foundry/concepts/observability#what-are-evaluators).
131131

132132
#### Conversation support for text
133133

@@ -192,7 +192,7 @@ To run batch evaluations by using [local evaluation](#local-evaluation-on-test-d
192192
Our evaluators understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`, and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
193193

194194
> [!NOTE]
195-
> In the second turn, even if `context` is `null` or a missing key, it's interpreted as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
195+
> In the second turn, even if `context` is `null` or a missing key, the evaluator interprets the turn as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
196196
197197
For conversation mode, here's an example for `GroundednessEvaluator`:
198198

@@ -346,7 +346,7 @@ Currently the image and multi-modal evaluators support:
346346
For AI-assisted quality evaluators (except for `GroundednessProEvaluator` preview), you must specify a GPT model (`gpt-35-turbo`, `gpt-4`, `gpt-4-turbo`, `gpt-4o`, or `gpt-4o-mini`) in your `model_config`. The GPT model acts as a judge to score the evaluation data. We support both Azure OpenAI or OpenAI model configuration schemas. For the best performance and parseable responses with our evaluators, we recommend using GPT models that aren't in preview.
347347

348348
> [!NOTE]
349-
> We strongly recommend that you replace `gpt-3.5-turbo` with `gpt-4o-mini` for your evaluator model, because the latter is cheaper, more capable, and just as fast, according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
349+
> We strongly recommend that you replace `gpt-3.5-turbo` with `gpt-4o-mini` for your evaluator model. The latter is cheaper, more capable, and as fast, according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
350350
>
351351
> Make sure that you have at least the `Cognitive Services OpenAI User` role for the Azure OpenAI resource to make inference calls with the API key. To learn more about permissions, see [Permissions for an Azure OpenAI resource](../../../ai-services/openai/how-to/role-based-access-control.md#summary).
352352
@@ -371,7 +371,7 @@ After you spot-check your built-in or custom evaluators on a single row of data,
371371

372372
### Prerequisite set up steps for Azure AI Foundry projects
373373

374-
If this is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few additional setup steps:
374+
If this session is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few other setup steps:
375375

376376
1. [Create and connect your storage account](https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/microsoft/infrastructure-setup/01-connections/connection-storage-account.bicep) to your Azure AI Foundry project at the resource level. This bicep template provisions and connects a storage account to your Foundry project with key authentication.
377377
2. Make sure the connected storage account has access to all projects.
@@ -547,11 +547,11 @@ result = evaluate(
547547

548548
## Related content
549549

550-
- [Azure AI Evaluation Python SDK client reference documentation](https://aka.ms/azureaieval-python-ref)
551-
- [Azure AI Evaluation SDK client troubleshooting guide](https://aka.ms/azureaieval-tsg)
552-
- [Learn more about the evaluation metrics](../../concepts/evaluation-metrics-built-in.md)
553-
- [Evaluate your generative AI applications remotely on the cloud](./cloud-evaluation.md)
554-
- [Learn more about simulating test datasets for evaluation](./simulator-interaction-data.md)
555-
- [View your evaluation results in an Azure AI project](../../how-to/evaluate-results.md)
556-
- [Get started building a chat app by using the Azure AI Foundry SDK](../../quickstarts/get-started-code.md)
550+
- [Azure AI Evaluation client library for Python](https://aka.ms/azureaieval-python-ref)
551+
- [Troubleshoot AI Evaluation SDK Issues](https://aka.ms/azureaieval-tsg)
552+
- [Observability in generative AI](../../concepts/evaluation-metrics-built-in.md)
553+
- [Run evaluations in the cloud by using the Azure AI Foundry SDK](./cloud-evaluation.md)
554+
- [Generate synthetic and simulated data for evaluation](./simulator-interaction-data.md)
555+
- [See evaluation results in the Azure AI Foundry portal](../../how-to/evaluate-results.md)
556+
- [Get started with Azure AI Foundry](../../quickstarts/get-started-code.md)
557557
- [Get started with evaluation samples](https://aka.ms/aistudio/eval-samples)

0 commit comments

Comments
 (0)