minor updates

changliu2 · changliu2 · commit 7478262d8b1c · 2025-04-29T11:09:33.000-04:00
diff --git a/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md b/articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md
@@ -189,7 +189,7 @@ The result of the AI-assisted quality evaluators for a query and response pair i
 To further improve intelligibility, all evaluators accept a binary threshold (unless they output already binary outputs) and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
 
 - `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
-- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
+- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user.
 - `additional_details` contains debugging information about the quality of a single agent run. 
 
 ```json
@@ -238,7 +238,7 @@ from azure.ai.evaluation import AIAgentConverter
 # Initialize the converter
 converter = AIAgentConverter(project_client)
 
-# specify a file path to save agent output (which is evaluation input data)
+# Specify a file path to save agent output (which is evaluation input data)
 filename = os.path.join(os.getcwd(), "evaluation_input_data.jsonl")
 
 evaluation_data = converter.prepare_evaluation_data(thread_ids=thread_id, filename=filename) 
@@ -255,7 +255,7 @@ import os
 from dotenv import load_dotenv
 load_dotenv()
 
-
+# Another convenient way to access model config from the project_client 
 project_client = AIProjectClient.from_connection_string(
     credential=DefaultAzureCredential(),
     conn_str=os.environ["PROJECT_CONNECTION_STRING"],
@@ -269,12 +269,12 @@ model_config = project_client.connections.get_default(
                                             include_credentials=True
                                           )
 
-# select evaluators
+# Select evaluators of your choice
 intent_resolution = IntentResolutionEvaluator(model_config=model_config)
 task_adherence = TaskAdherenceEvaluator(model_config=model_config)
 tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)
 
-# batch run API
+# Batch evaluation API (local)
 from azure.ai.evaluation import evaluate
 
 response = evaluate(
@@ -292,9 +292,9 @@ response = evaluate(
         "resource_group_name": os.environ["RESOURCE_GROUP_NAME"],
     }
 )
-# look at the average scores 
+# Inspect the average scores at a high-level
 print(response["metrics"])
-# use the URL to inspect the results on the UI
+# Use the URL to inspect the results on the UI
 print(f'AI Foundary URL: {response.get("studio_url")}')
 ```
 
@@ -303,7 +303,7 @@ Following the URI, you will be redirected to Foundry to view your evaluation res
 With Azure AI Evaluation SDK client library, you can seamlessly evaluate your Azure AI agents via our converter support, which enables observability and transparency into agentic workflows.
 
 
-## Evaluators with agent message support
+## Evaluating other agents
 
 For agents outside of Azure AI Agent Service, you can still evaluate them by preparing the right data for the evaluators of your choice.
 
@@ -328,7 +328,7 @@ We'll demonstrate some examples of the two data formats: simple agent data, and
 As with other [built-in AI-assisted quality evaluators](./evaluate-sdk.md#performance-and-quality-evaluators), `IntentResolutionEvaluator` and `TaskAdherenceEvaluator` output a likert score (integer 1-5; higher score is better). `ToolCallAccuracyEvaluator` outputs the passing rate of all tool calls made (a float between 0-1) based on user query. To further improve intelligibility, all evaluators accept a binary threshold and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
 
 - `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
-- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
+- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user.
 
 ### Simple agent data
 
diff --git a/articles/ai-foundry/how-to/develop/cloud-evaluation.md b/articles/ai-foundry/how-to/develop/cloud-evaluation.md
@@ -103,9 +103,9 @@ or contain conversation data like this:
 }
 ```
 
-For more details on input data formats, refer to [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image). 
+To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image). 
 
-For agent evaluation, refer to [evaluator support for agent messages](./agent-evaluate-sdk.md#evaluators-with-agent-message-support).
+To learn more about input data formats for evaluating agents, see [evaluating Azure AI agents](./agent-evaluate-sdk.md#evaluate-azure-ai-agents) and [evaluating other agents](./agent-evaluate-sdk.md/#evaluating-other-agents).
  
 
 We provide two ways to register your data in Azure AI project required for evaluations in the cloud:
@@ -236,9 +236,9 @@ print("Versioned evaluator id:", registered_evaluator.id)
 
 After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
 
-## Cloud evaluation (preview) with Azure AI Projects SDK
+## Submit a cloud evaluation
 
-Putting the above altogether, you can now submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
+Putting the previous code altogether, you can now submit a cloud evaluation with Azure AI Projects SDK client library via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
 
 ```python
 import os, time
@@ -257,7 +257,7 @@ project_client = AIProjectClient.from_connection_string(
     conn_str="<connection_string>"
 )
 
-# Construct dataset ID per the instruction above
+# Construct dataset ID per the instruction previously
 data_id = "<dataset-id>"
 
 default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
diff --git a/articles/ai-foundry/how-to/develop/evaluate-sdk.md b/articles/ai-foundry/how-to/develop/evaluate-sdk.md
@@ -161,7 +161,7 @@ conversation = {
 
 ```
 
-To run batch evaluations using [local evaluation](#local-evaluation-on-test-datasets-using-evaluate) or [upload your dataset to run cloud evaluation](./cloud-evaluation.md#uploading-evaluation-data), you will need to represent the dataset in `.jsonl` format. The above conversation is equivalent to a line of dataset as following in a `.jsonl` file:
+To run batch evaluations using [local evaluation](#local-evaluation-on-test-datasets-using-evaluate) or [upload your dataset to run cloud evaluation](./cloud-evaluation.md#uploading-evaluation-data), you will need to represent the dataset in `.jsonl` format. The previous conversation is equivalent to a line of dataset as following in a `.jsonl` file:
 
 ```json
 {"conversation":
@@ -441,7 +441,7 @@ The result of the AI-assisted quality evaluators for a query and response pair i
 To further improve intelligibility, all evaluators accept a binary threshold (unless they output already binary outputs) and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
 
 - `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
-- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
+- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user.