MicrosoftDocs
diff --git a/‎articles/machine-learning/prompt-flow/how-to-bulk-test-evaluate-flow.md
Lines changed: 15 additions & 15 deletions b/‎articles/machine-learning/prompt-flow/how-to-bulk-test-evaluate-flow.md
Lines changed: 15 additions & 15 deletions
@@ -17,9 +17,9 @@ ms.date: 10/28/2024
 
 # Submit a batch run to evaluate a flow
 
-A batch run executes your prompt flow with a large dataset and generates outputs for each data row. To evaluate how well the prompt flow performs with a large dataset, you can submit a batch run and use evaluation methods to generate performance scores and metrics.
+A batch run executes a prompt flow with a large dataset and generates outputs for each data row. To evaluate how well your prompt flow performs with a large dataset, you can submit a batch run and use evaluation methods to generate performance scores and metrics.
 
-After the batch flow completes, the evaluation methods automatically execute to calculate the scores and metrics. You can use the evaluation metrics to compare the output of your flow with your performance criteria and goals.
+After the batch flow completes, the evaluation methods automatically execute to calculate the scores and metrics. You can use the evaluation metrics to assess the output of your flow against your performance criteria and goals.
 
 This article describes how to submit a batch run and use an evaluation method to measure the quality of your flow output. You learn how to view the evaluation result and metrics, and how to start a new round of evaluation with a different method or subset of variants.
 
@@ -58,7 +58,7 @@ To submit a batch run, you select the dataset to test your flow with. You can al
 
    :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-evaluation-selection.png" alt-text="Screenshot of evaluation settings where you can select built-in evaluation method." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-evaluation-selection.png":::
 
-1. On the **Configure evaluation** screen, specify the sources of required inputs for the evaluation. For example, the ground truth column might come from a dataset. By default, evaluation uses the same dataset as the overall batch run. However, if the corresponding labels or target ground truth values are in a different dataset, you can use that one.
+1. Next, on the **Configure evaluation** screen, specify the sources of required inputs for the evaluation. For example, the ground truth column might come from a dataset. By default, evaluation uses the same dataset as the overall batch run. However, if the corresponding labels or target ground truth values are in a different dataset, you can use that one.
 
    > [!NOTE]
    > If your evaluation method doesn't require data from a dataset, dataset selection is an optional configuration that doesn't affect evaluation results. You don't need to select a dataset, or reference any dataset columns in the input mapping section.
@@ -70,7 +70,7 @@ To submit a batch run, you select the dataset to test your flow with. You can al
 
    :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-evaluation-input-mapping.png" alt-text="Screenshot of evaluation input mapping." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-evaluation-input-mapping.png":::
 
-1. Some evaluation methods require Large Language Models (LLMs) like GPT-4 or GPT-3 or need other connections to consume credentials or keys. For those methods, you must enter the connection data in the **Connection** section at the bottom of this screen to be able to use the evaluation flow. For more information, see [Set up a connection](get-started-prompt-flow.md#set-up-a-connection).
+1. Some evaluation methods require Large Language Models (LLMs) like GPT-4 or GPT-3, or need other connections to consume credentials or keys. For those methods, you must enter the connection data in the **Connection** section at the bottom of this screen to be able to use the evaluation flow. For more information, see [Set up a connection](get-started-prompt-flow.md#set-up-a-connection).
 
    :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-evaluation-connection.png" alt-text="Screenshot of connection where you can configure the connection for evaluation method. " lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-evaluation-connection.png":::
 
@@ -88,7 +88,7 @@ You can find the list of submitted batch runs on the **Runs** tab in the Azure M
 
    :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-list.png" alt-text="Screenshot of prompt flow run list page where you find batch runs." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-list.png":::
 
-   On the **Visualize outputs** screen, the **Runs & metrics** section shows overall results for the batch run and the evaluation run. The **Outputs** section shows the run inputs and outputs line by line in a results table that also includes line ID, **Run**, **Status**, and **System metrics**.
+   On the **Visualize outputs** screen, the **Runs & metrics** section shows overall results for the batch run and the evaluation run. The **Outputs** section shows the run inputs line by line in a results table that also includes line ID, **Run**, **Status**, and **System metrics**.
 
    :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-output.png" alt-text="Screenshot of batch run result page on the outputs tab where you check batch run outputs. " lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-output.png":::
 
@@ -100,13 +100,13 @@ You can find the list of submitted batch runs on the **Runs** tab in the Azure M
 
    :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-output-new-evaluation.png" alt-text="Screenshot of the Trace view with expanded steps and details." lightbox ="./media/how-to-bulk-test-evaluate-flow/batch-run-output-new-evaluation.png":::
 
-You can also view evaluation run results from the prompt flow page you tested. Under **View batch runs**, select **View batch runs** to see the list of batch runs for the flow, or select **View latest batch run outputs** to see the outputs for the latest run.
+You can also view evaluation run results from the prompt flow you tested. Under **View batch runs**, select **View batch runs** to see the list of batch runs for the flow, or select **View latest batch run outputs** to see the outputs for the latest run.
 
 :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-history.png" alt-text="Screenshot of Web Classification with the view bulk runs button selected." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-history.png":::
 
 In the batch run list, select a batch run name to open the flow page for that run.
 
-On the flow page for an evaluation run, select **View outputs** to see details for the flow. You can also **Clone** the flow to create a new flow, or **Deploy** it as an online endpoint.
+On the flow page for an evaluation run, select **View outputs** or **Details** to see details for the flow. You can also **Clone** the flow to create a new flow, or **Deploy** it as an online endpoint.
 
 :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-history-list.png" alt-text="Screenshot of batch run runs showing the history." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-history-list.png":::
 
@@ -124,7 +124,7 @@ On the **Details** screen:
 
   :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-snapshot.png" alt-text="Screenshot of batch run snapshot." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-snapshot.png":::
 
-### Start a new evaluation round on the same run
+### Start a new evaluation round for the same run
 
 You can run a new evaluation round to calculate metrics for a completed batch run without running the flow again. This process saves the cost of rerunning your flow and is helpful in the following scenarios:
 
@@ -140,15 +140,15 @@ The new run appears in the prompt flow **Run** list, and you can select more tha
 
 If you modify your flow to improve its performance, you can submit multiple batch runs to compare the performance of the different flow versions. You can also compare the metrics calculated by different evaluation methods to see which method is more suitable for your flow.
 
-To check your flow batch run history, select **View batch runs** at the top of your flow page. You can select each run to check the detail. You can also select multiple runs and select **Visualize outputs** to compare the metrics and the outputs of those runs.
+To check your flow batch run history, select **View batch runs** at the top of your flow page. You can select each run to check the details. You can also select multiple runs and select **Visualize outputs** to compare the metrics and the outputs of those runs.
 
 :::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-compare.png" alt-text="Screenshot of metrics compare of multiple batch runs." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-compare.png":::
 
 ## Understand built-in evaluation metrics
 
 Azure Machine Learning prompt flow provides several built-in evaluation methods to help you measure the performance of your flow output. Each evaluation method calculates different metrics. The following table describes the available built-in evaluation methods.
 
-| Evaluation method | Metrics  | Description | Connection required? | Required input | Score values |
+| Evaluation method | Metric  | Description | Connection required? | Required input | Score values |
 |---|---|---|---|---|---|
 | Classification Accuracy Evaluation | Accuracy | Measures the performance of a classification system by comparing its outputs to ground truth | No | prediction, ground truth | In the range [0, 1] |
 | QnA Groundedness Evaluation | Groundedness | Measures how grounded the model's predicted answers are in the input source. Even if the LLM responses are accurate, they're ungrounded if they're not verifiable against source. | Yes | question, answer, context (no ground truth) | 1 to 5, with 1 = worst and 5 = best |
@@ -165,22 +165,22 @@ If your run fails, check the output and log data and debug any flow failure. To
 
 ### Prompt engineering
 
-Prompt construction can be difficult. To learn about prompt construction concepts, see [Introduction to prompt engineering](/azure/cognitive-services/openai/concepts/prompt-engineering). To learn how to construct a prompt that can help achieve your goals, see [Prompt engineering techniques](/azure/cognitive-services/openai/concepts/advanced-prompt-engineering).
+Prompt construction can be difficult. To learn about prompt construction concepts, see [Overview of prompts](/ai-builder/prompts-overview). To learn how to construct a prompt that can help achieve your goals, see [Prompt engineering techniques](/azure/cognitive-services/openai/concepts/prompt-engineering).
 
 ### System message
 
-You can use the system message, sometimes referred to as a metaprompt or [system prompt](/azure/cognitive-services/openai/concepts/advanced-prompt-engineering#meta-prompts), to guide an AI system's behavior and improve system performance. To learn how to improve your flow performance with system messages, see [System message framework and template recommendations for Large Language Models (LLMs)](/azure/cognitive-services/openai/concepts/system-message).
+You can use the system message, sometimes referred to as a metaprompt or [system prompt](/azure/cognitive-services/openai/concepts/advanced-prompt-engineering), to guide an AI system's behavior and improve system performance. To learn how to improve your flow performance with system messages, see [System messages step-by-step authoring](/azure/cognitive-services/openai/concepts/system-message#step-by-step-authoring-best-practices).
 
 ### Golden datasets
 
 Creating a copilot that uses LLMs typically involves grounding the model in reality by using source datasets. A *golden dataset* helps ensure that the LLMs provide the most accurate and useful responses to customer queries.
 
 A golden dataset is a collection of realistic customer questions and expertly crafted answers that serve as a quality assurance tool for the LLMs your copilot uses. Golden datasets aren't used to train an LLM or inject context into an LLM prompt, but to assess the quality of the answers the LLM generates.
 
-If your scenario involves a copilot, or you're building your own copilot, see [Producing Golden Datasets: Guidance for creating Golden Datasets used for Copilot quality assurance](https://aka.ms/copilot-golden-dataset-guide) for detailed guidance and best practices.
+If your scenario involves a copilot, or you're building your own copilot, see [Producing Golden Datasets](https://aka.ms/copilot-golden-dataset-guide) for detailed guidance and best practices.
 
 ## Related content
 
-- [Develop a customized evaluation flow](how-to-develop-an-evaluation-flow.md#use-a-customized-evaluation-flow)
+- [Develop a customized evaluation flow](how-to-develop-an-evaluation-flow.md#develop-an-evaluation-flow)
 - [Tune prompts using variants](how-to-tune-prompts-using-variants.md)
-- [Deploy a flow](how-to-deploy-for-real-time-inference.md)
+- [Deploy a flow as a managed online endpoint for real-time inference](how-to-deploy-for-real-time-inference.md)