You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-evaluate-semantic-kernel.md
+20-20Lines changed: 20 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ Similar to the integration of Langchain with Prompt flow, Semantic Kernel, which
47
47
> [!IMPORTANT]
48
48
> Prior to developing the flow, it's essential to install the [Semantic Kernel package](/semantic-kernel/get-started/quick-start-guide/?toc=%2Fsemantic-kernel%2Ftoc.json&tabs=python) in your runtime environment for executor.
49
49
50
-
To learn more, see [Customize environment for runtime](./how-to-customize-environment-runtime.md) for guidance.
50
+
To learn more, see [Customize environment for runtime](./how-to-customize-environment-runtime.md) for guidance.
51
51
52
52
> [!IMPORTANT]
53
53
> The approach to consume OpenAI or Azure OpenAI in Semantic Kernel is to to obtain the keys you have specified in environment variables or stored in a `.env` file.
@@ -73,54 +73,54 @@ Once the setup is complete, you can conveniently convert your existing Semantic
73
73
74
74
For example, we can create a flow with a Semantic Kernel planner that solves math problems. Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/create-a-prompt-flow-with-semantic-kernel) with steps necessary to create a simple Prompt flow with Semantic Kernel at its core.
75
75
76
-
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png" alt-text="Create a flow with semantic kernel planner" lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png":::
76
+
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png" alt-text="Screenshot of creating a flow with semantic kernel planner." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png":::
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/set-connection-in-python.png" alt-text="Screenshot of setting custom connection in python node." lightbox = "./media/how-to-evaluate-semantic-kernel/set-connection-in-python.png":::
81
81
82
82
Select the connection object in the node input, and set the model name of OpenAI or deployment name of Azure OpenAI.
83
83
84
-
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/set-key-model.png" alt-text="Set model and key in node input" lightbox = "./media/how-to-evaluate-semantic-kernel/set-key-model.png":::
84
+
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/set-key-model.png" alt-text="Screenshot of setting model and key in node input." lightbox = "./media/how-to-evaluate-semantic-kernel/set-key-model.png":::
85
85
86
86
### Batch testing your plugins and planners
87
87
88
88
Instead of manually testing different scenarios one-by-one, now you can now automatically run large batches of tests using Prompt flow and benchmark data.
89
89
90
-
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png" alt-text="Batch runs with prompt flow for Semantic kernel" lightbox = "./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png":::
90
+
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png" alt-text="Screenshot of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png":::
91
91
92
92
Once the flow has passed the single test run in the previous step, you can effortlessly create a batch test in Prompt flow by adhering to the following steps:
93
93
1. Create benchmark data in a *jsonl* file, contains a list of JSON objects that contains the input and the correct ground truth.
94
94
1. Click *Batch run* to create a batch test.
95
95
1. Complete the batch run settings, especially the data part.
96
-
1. Submit run without evaluation (for this specific batch test, the *Evaluation st*ep can be skipped).
96
+
1. Submit run without evaluation (for this specific batch test, the *Evaluation step* can be skipped).
97
97
98
-
In our [updated docs](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/running-batches-with-prompt-flow?tabs=gpt-35-turbo), we demonstrate how you can use this functionality to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
98
+
In our [Running batches with Prompt flow](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/running-batches-with-prompt-flow?tabs=gpt-35-turbo), we demonstrate how you can use this functionality to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
99
99
100
-
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png" alt-text="Data of batch runs with prompt flow for Semantic kernel" lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png":::
100
+
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png" alt-text="Screenshot of data of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png":::
101
101
102
-
In your workspace, you can go to the **Run list** in Prompt flow, click**Details** button, and then click**Output** tab to view the batch run result.
102
+
In your workspace, you can go to the **Run list** in Prompt flow, select**Details** button, and then select**Output** tab to view the batch run result.
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/run.png" alt-text="Screenshot of the run list." lightbox = "./media/how-to-evaluate-semantic-kernel/run.png":::
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/run-detail.png" alt-text="Screenshot of the run detail." lightbox = "./media/how-to-evaluate-semantic-kernel/run-detail.png":::
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/run-output.png" alt-text="Screenshot of the run output." lightbox = "./media/how-to-evaluate-semantic-kernel/run-output.png":::
109
109
110
110
### Evaluating the accuracy
111
111
112
112
Once a batch run is completed, you then need an easy way to determine the adequacy of the test results. This information can then be used to develop accuracy scores, which can be incrementally improved.
113
113
114
-
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-batch-run-with-prompt-flow.png" alt-text="Evaluating batch run with prompt flow" lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-batch-run-with-prompt-flow.png":::
114
+
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-batch-run-with-prompt-flow.png" alt-text="Screenshot of evaluating batch run with prompt flow." lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-batch-run-with-prompt-flow.png":::
115
115
116
116
Evaluation flows in Prompt flow enable this functionality. Using the sample evaluation flows offered by prompt flow, you can assess various metrics such as **classification accuracy**, **perceived intelligence**, **groundedness**, and more.
Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo) for Semantic Kernel to learn more about how to use the [math accuracy evaluation flow](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-accuracy-maths-to-code) to test our planner to see how well it solves word problems.
@@ -140,11 +140,11 @@ After running the evaluator, you’ll get a summary back of your metrics. Initia
140
140
141
141
To check the metrics, you can go back to the batch run detail page, click **Details** button, and then click **Output** tab, select the evaluation run name in the dropdown list to view the evaluation result.
0 commit comments