Freshness, in progress.

TimShererWithAquent · TimShererWithAquent · commit ff4df107276e · 2025-10-08T12:39:09.000-07:00
diff --git a/articles/ai-foundry/how-to/develop/evaluate-sdk.md b/articles/ai-foundry/how-to/develop/evaluate-sdk.md
@@ -32,7 +32,7 @@ pip install azure-ai-evaluation
 ```
 
 > [!NOTE]
-> For more detailed information, see the [API reference documentation for the Azure AI Evaluation SDK](https://aka.ms/azureaieval-python-ref).
+> For more information, see [Azure AI Evaluation client library for Python](https://aka.ms/azureaieval-python-ref).
 
 ## Built-in evaluators
 
@@ -91,11 +91,11 @@ Built-in evaluators can accept query and response pairs, a list of conversations
 
 
 > [!NOTE]
-> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators, except that it will be 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
+> AI-assisted quality evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation is set to 800 for all AI-assisted evaluators, except that it has the value 1600 for `RetrievalEvaluator` and 3000 for `ToolCallAccuracyEvaluator` to accommodate for longer inputs.
 
 Azure OpenAI graders require a template that describes how their input columns are turned into the *real* input that the grader uses. Example: If you have two inputs called *query* and *response*, and a template that was formatted as `{{item.query}}`, then only the query would be used. Similarly, you could have something like `{{item.conversation}}` to accept a conversation input, but the ability of the system to handle that depends on how you configure the rest of the grader to expect that input.
 
-For more information on data requirements for agentic evaluators, go to [Run agent evaluations locally with the Azure AI Evaluation SDK](agent-evaluate-sdk.md).
+For more information on data requirements for agentic evaluators, see [Evaluate your AI agents](agent-evaluate-sdk.md).
 
 #### Single-turn support for text
 
@@ -127,7 +127,7 @@ The evaluation test dataset can contain the following, depending on the requirem
 - **Context**: The source the generated response is based on (that is, the grounding documents).
 - **Ground truth**: The response generated by a user or human as the true answer.
 
-To see what each evaluator requires, you can learn more in the [built-in evaluators documents](/azure/ai-foundry/concepts/observability#what-are-evaluators).
+To see what each evaluator requires, see [Evaluators](/azure/ai-foundry/concepts/observability#what-are-evaluators).
 
 #### Conversation support for text
 
@@ -192,7 +192,7 @@ To run batch evaluations by using [local evaluation](#local-evaluation-on-test-d
 Our evaluators understand that the first turn of the conversation provides valid `query` from `user`, `context` from `assistant`,  and `response` from `assistant` in the query-response format. Conversations are then evaluated per turn and results are aggregated over all turns for a conversation score.
 
 > [!NOTE]
-> In the second turn, even if `context` is `null` or a missing key, it's interpreted as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
+> In the second turn, even if `context` is `null` or a missing key, the evaluator interprets the turn as an empty string instead of erroring out, which might lead to misleading results. We strongly recommend that you validate your evaluation data to comply with the data requirements.
 
 For conversation mode, here's an example for `GroundednessEvaluator`:
 
@@ -346,7 +346,7 @@ Currently the image and multi-modal evaluators support:
 For AI-assisted quality evaluators (except for `GroundednessProEvaluator` preview), you must specify a GPT model (`gpt-35-turbo`, `gpt-4`, `gpt-4-turbo`, `gpt-4o`, or `gpt-4o-mini`) in your `model_config`. The GPT model acts as a judge to score the evaluation data. We support both Azure OpenAI or OpenAI model configuration schemas. For the best performance and parseable responses with our evaluators, we recommend using GPT models that aren't in preview.
 
 > [!NOTE]
-> We strongly recommend that you replace `gpt-3.5-turbo` with `gpt-4o-mini` for your evaluator model, because the latter is cheaper, more capable, and just as fast, according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
+> We strongly recommend that you replace `gpt-3.5-turbo` with `gpt-4o-mini` for your evaluator model. The latter is cheaper, more capable, and as fast, according to [OpenAI](https://platform.openai.com/docs/models/gpt-4#gpt-3-5-turbo).
 >
 > Make sure that you have at least the `Cognitive Services OpenAI User` role for the Azure OpenAI resource to make inference calls with the API key. To learn more about permissions, see [Permissions for an Azure OpenAI resource](../../../ai-services/openai/how-to/role-based-access-control.md#summary).  
 
@@ -371,7 +371,7 @@ After you spot-check your built-in or custom evaluators on a single row of data,
 
 ### Prerequisite set up steps for Azure AI Foundry projects
 
-If this is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few additional setup steps:
+If this session is your first time running evaluations and logging it to your Azure AI Foundry project, you might need to do a few other setup steps:
 
 1. [Create and connect your storage account](https://github.com/azure-ai-foundry/foundry-samples/blob/main/samples/microsoft/infrastructure-setup/01-connections/connection-storage-account.bicep) to your Azure AI Foundry project at the resource level. This bicep template provisions and connects a storage account to your Foundry project with key authentication.
 2. Make sure the connected storage account has access to all projects.
@@ -547,11 +547,11 @@ result = evaluate(
 
 ## Related content
 
-- [Azure AI Evaluation Python SDK client reference documentation](https://aka.ms/azureaieval-python-ref)
-- [Azure AI Evaluation SDK client troubleshooting guide](https://aka.ms/azureaieval-tsg)
-- [Learn more about the evaluation metrics](../../concepts/evaluation-metrics-built-in.md)
-- [Evaluate your generative AI applications remotely on the cloud](./cloud-evaluation.md)
-- [Learn more about simulating test datasets for evaluation](./simulator-interaction-data.md)
-- [View your evaluation results in an Azure AI project](../../how-to/evaluate-results.md)
-- [Get started building a chat app by using the Azure AI Foundry SDK](../../quickstarts/get-started-code.md)
+- [Azure AI Evaluation client library for Python](https://aka.ms/azureaieval-python-ref)
+- [Troubleshoot AI Evaluation SDK Issues](https://aka.ms/azureaieval-tsg)
+- [Observability in generative AI](../../concepts/evaluation-metrics-built-in.md)
+- [Run evaluations in the cloud by using the Azure AI Foundry SDK](./cloud-evaluation.md)
+- [Generate synthetic and simulated data for evaluation](./simulator-interaction-data.md)
+- [See evaluation results in the Azure AI Foundry portal](../../how-to/evaluate-results.md)
+- [Get started with Azure AI Foundry](../../quickstarts/get-started-code.md)
 - [Get started with evaluation samples](https://aka.ms/aistudio/eval-samples)