Skip to content

Commit 04ef5ee

Browse files
committed
edits
1 parent b88246a commit 04ef5ee

17 files changed

+51
-58
lines changed

articles/machine-learning/prompt-flow/how-to-bulk-test-evaluate-flow.md

Lines changed: 42 additions & 49 deletions
Large diffs are not rendered by default.

articles/machine-learning/prompt-flow/how-to-develop-an-evaluation-flow.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.topic: how-to
1111
author: lgayhardt
1212
ms.author: lagayhar
1313
ms.reviewer: ziqiwang
14-
ms.date: 10/25/2024
14+
ms.date: 10/28/2024
1515
---
1616

1717
# Evaluation flows and metrics
@@ -54,23 +54,23 @@ The outputs of an evaluation are results that show the performance of the flow b
5454

5555
#### Output scores
5656

57-
Prompt flows process one row of data at a time and generate an output record. Evaluation flows likewise can calculate scores for each row of data, so you can check how a flow performs on each individual data point.
57+
A prompt flows processes one row of data at a time and generates an output record. Evaluation flows likewise can calculate scores for each row of data, so you can check how a flow performs on each individual data point.
5858

5959
You can record the scores for each data instance as evaluation flow outputs by specifying them in the output section of the evaluation flow. The authoring experience is the same as defining a standard flow output.
6060

6161
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/eval-output.png" alt-text="Screenshot of the outputs section showing a name and value.":::
6262

63-
You can view the individual scores in the **Outputs** tab when you select **View outputs**, the same as when you check the outputs of a standard flow batch run. You can append these instance-level scores to the output of the flow being tested.
63+
You can view the individual scores in the **Outputs** tab when you select **View outputs**, the same as when you check the outputs of a standard flow batch run. You can append these instance-level scores to the output of the flow you tested.
6464

6565
#### Aggregation and metrics logging
6666

67-
The evaluation flow also provides an overall assessment for the run. To distinguish them from individual output scores, values for evaluating overall run performance are called *metrics*.
67+
The evaluation flow also provides an overall assessment for the run. To distinguish the overall results from individual output scores, these overall run performance values are called *metrics*.
6868

6969
To calculate an overall assessment value based on individual scores, select the **Aggregation** checkbox on a Python node in an evaluation flow to turn it into a *reduce* node. The node then takes in the inputs as a list and processes them as a batch.
7070

71-
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/set-as-aggregation.png" alt-text="Screenshot of the Python node heading pointing to an unchecked checked box.":::
71+
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/set-as-aggregation.png" alt-text="Screenshot of the Python node heading with the Aggregation checkbox selected.":::
7272

73-
By using aggregation, you can calculate and process all the scores of each flow output and compute an overall result by using each score output. For example, to calculate the accuracy of a classification flow, you can calculate the accuracy of each score output and then calculate the average accuracy of all the score outputs. Then, you can log the average accuracy as a metric by using `promptflow_sdk.log_metric()`. Metrics must be numerical, such as `float` or `int`. String type metrics logging isn't supported.
73+
By using aggregation, you can calculate and process all the scores of each flow output and compute an overall result by using each score. For example, to calculate the accuracy of a classification flow, you can calculate the accuracy of each score output and then calculate the average accuracy of all the score outputs. Then, you can log the average accuracy as a metric by using `promptflow_sdk.log_metric()`. Metrics must be numerical, such as `float` or `int`. String type metrics logging isn't supported.
7474

7575
The following code snippet is an example of calculating overall accuracy by averaging the accuracy score `grades` of all data points. The overall accuracy is logged as a metric by using `promptflow_sdk.log_metric()`.
7676

@@ -87,7 +87,7 @@ def calculate_accuracy(grades: List[str]): # Receive a list of grades from a pre
8787
return accuracy
8888
```
8989

90-
Because you call this function in the Python node, you don't need to assign it elsewhere, and you can view the metrics later. After you use this evaluation method in a batch run, you can view the metric showing overall performance by selecting the **Metrics** tab when you **View outputs**.
90+
Because you call this function in the Python node, you don't need to assign it elsewhere, and you can view the metrics later. After you use this evaluation method in a batch run, you can view the metric showing overall performance by selecting the **Metrics** tab when you view outputs.
9191

9292
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/evaluation-metrics-bulk.png" alt-text="Screenshot of the metrics tab that shows the metrics logged by log metrics. " lightbox = "./media/how-to-develop-an-evaluation-flow/evaluation-metrics-bulk.png":::
9393

@@ -153,15 +153,15 @@ After you create your own evaluation flow and metrics, you can use the flow to a
153153
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/evaluate-button.png" alt-text="Screenshot of evaluation button.":::
154154

155155

156-
1. In the **Batch run & Evaluate** wizard, complete the **Basic settings** and **Batch run settings** to load the dataset for testing and configure the input mapping. For more information, see [Submit batch run and evaluate a flow in prompt flow](how-to-bulk-test-evaluate-flow.md#submit-batch-run-and-evaluate-a-flow).
156+
1. In the **Batch run & Evaluate** wizard, complete the **Basic settings** and **Batch run settings** to load the dataset for testing and configure the input mapping. For more information, see [Submit batch run and evaluate a flow](how-to-bulk-test-evaluate-flow.md).
157157

158158
1. In the **Select evaluation** step, you can select one or more of your customized evaluations or built-in evaluations to run. **Customized evaluation** lists all the evaluation flows that you created, cloned, or customized. Evaluation flows created by others in the same project don't appear in this section.
159159

160160
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/select-customized-evaluation.png" alt-text="Screenshot of selecting customized evaluation." lightbox = "./media/how-to-develop-an-evaluation-flow/select-customized-evaluation.png":::
161161

162162
1. On the **Configure evaluation** screen, specify the sources of any input data needed for the evaluation method. For example, the ground truth column might come from a dataset. If your evaluation method doesn't require data from a dataset, you don't need to select a dataset or reference any dataset columns in the input mapping section.
163163

164-
In the **Evaluation input mapping** section, you can indicate the sources of required inputs for the evaluation. If the data source is from your run output, set the source as `${run.outputs.[OutputName]}`. If the data is from your test dataset, set the source as `${data.[ColumnName]}`. Any descriptions you set for the data inputs also appear here. For more information, see [Submit batch run and evaluate a flow in prompt flow](how-to-bulk-test-evaluate-flow.md#submit-batch-run-and-evaluate-a-flow).
164+
In the **Evaluation input mapping** section, you can indicate the sources of required inputs for the evaluation. If the data source is from your run output, set the source as `${run.outputs.[OutputName]}`. If the data is from your test dataset, set the source as `${data.[ColumnName]}`. Any descriptions you set for the data inputs also appear here. For more information, see [Submit batch run and evaluate a flow](how-to-bulk-test-evaluate-flow.md).
165165

166166
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/bulk-test-evaluation-input-mapping.png" alt-text="Screenshot of evaluation input mapping." lightbox = "./media/how-to-develop-an-evaluation-flow/bulk-test-evaluation-input-mapping.png":::
167167

-40.7 KB
Loading
462 Bytes
Loading
979 Bytes
Loading
607 Bytes
Loading
-28.4 KB
Loading
-9.29 KB
Loading
595 Bytes
Loading

0 commit comments

Comments
 (0)