You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-develop-an-evaluation-flow.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Evaluation flow and metrics in prompt flow
3
3
titleSuffix: Azure Machine Learning
4
-
description: Use Azure Machine Learning studio to create or customize evaluation flows and metrics tailored to your tasks and objectives, and use a batch run as a prompt flow evaluation method.
4
+
description: Use Azure Machine Learning studio to create or customize evaluation flows and metrics, and use a batch run as a prompt flow evaluation method.
5
5
services: machine-learning
6
6
ms.service: azure-machine-learning
7
7
ms.subservice: prompt-flow
@@ -11,18 +11,18 @@ ms.topic: how-to
11
11
author: lgayhardt
12
12
ms.author: lagayhar
13
13
ms.reviewer: ziqiwang
14
-
ms.date: 10/23/2024
14
+
ms.date: 10/24/2024
15
15
---
16
16
17
17
# Evaluation flows and metrics
18
18
19
-
Evaluation flows are a special type of prompt flows that calculate metrics to assess how well the outputs of a run meet specific criteria and goals. You can create or customize evaluation flows and metrics tailored to your tasks and objectives, and use them to evaluate other prompt flows. This article explains evaluation flows, how to develop and customize them, and how to use them in prompt flow batch runs to evaluate performance.
19
+
Evaluation flows are a special type of prompt flows that calculate metrics to assess how well the outputs of a run meet specific criteria and goals. You can create or customize evaluation flows and metrics tailored to your tasks and objectives, and use them to evaluate other prompt flows. This article explains evaluation flows, how to develop and customize them, and how to use them in prompt flow batch runs to evaluate flow performance.
20
20
21
21
## Understand evaluation flows
22
22
23
23
A prompt flow is a sequence of nodes that process input and generate output. Evaluation flows consume required inputs and produce corresponding outputs that are usually scores or metrics. Evaluation flows differ from standard flows in their authoring experience and usage.
24
24
25
-
Evaluation flows usually run after the run they're testing by receiving its outputs and using the outputs to calculate scores and metrics. Evaluation flows log metrics by using the promptflow SDK `log_metric()` function.
25
+
Evaluation flows usually run after the run they're testing by receiving its outputs and using the outputs to calculate scores and metrics. Evaluation flows log metrics by using the prompt flow SDK `log_metric()` function.
26
26
27
27
The outputs of the evaluation flow are results that measure the performance of the flow being tested. Evaluation flows can have an aggregation node that calculates the overall performance of the flow being tested over the test dataset.
28
28
@@ -109,7 +109,7 @@ For example, in the built-in Classification Accuracy Evaluation flow, the `grade
109
109
110
110
If you use the evaluation flow template, you calculate this score in the **line_process** Python node. You can also replace the **line_process** python node with an LLM node to use an LLM to calculate the score, or use multiple nodes to perform the calculation.
111
111
112
-
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/line-process.png" alt-text="Screenshot of line process node in the template." lightbox = "./media/how-to-develop-an-evaluation-flow/line-process.png":::
112
+
:::image type="content" source="./media/how-to-develop-an-evaluation-flow/line-process.png" alt-text="Screenshot of line process node in the template." lightbox="./media/how-to-develop-an-evaluation-flow/line-process.png":::
113
113
You specify the outputs of this node as the outputs of the evaluation flow, which indicates that the outputs are the scores calculated for each data sample. You can also output reasoning for more information, and it's the same experience as defining outputs in standard flow.
114
114
115
115
### Calculate and log metrics
@@ -158,7 +158,7 @@ After you create your own evaluation flow and metrics, you can use the flow to a
1. On the **Configure evaluation** screen, specify the sources of any input data that's needed for the evaluation method. For example, the ground truth column might come from a dataset.
161
+
1. On the **Configure evaluation** screen, specify the sources of any input data needed for the evaluation method. For example, the ground truth column might come from a dataset.
162
162
163
163
In the **Evaluation input mapping** section, you can indicate the sources of the required inputs for the evaluation. If the data source is from your run output, set the source as `${run.outputs.[OutputName]}`. If the data is from your test dataset, set the source as `${data.[ColumnName]}`. Any descriptions set for the data inputs also appear here. For more information, see [Submit batch run and evaluate a flow in prompt flow](how-to-bulk-test-evaluate-flow.md#submit-batch-run-and-evaluate-a-flow).
0 commit comments