You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-evaluate-semantic-kernel.md
+21-29Lines changed: 21 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,10 +19,9 @@ ms.date: 09/15/2023
19
19
20
20
In the rapidly evolving landscape of AI orchestration, a comprehensive evaluation of your plugins and planners is paramount for optimal performance. This article introduces how to evaluate your **Semantic Kernel**[plugins](/semantic-kernel/ai-orchestration/plugins) and [planners](/semantic-kernel/ai-orchestration/planners) with prompt flow. Furthermore, you can learn the seamless integration story between prompt flow and Semantic Kernel.
21
21
22
-
23
22
The integration of Semantic Kernel with prompt flow is a significant milestone.
24
-
* It allows you to harness the powerful AI orchestration capabilities of Semantic Kernel to enhance the efficiency and effectiveness of your prompt flow.
25
-
* More importantly, it enables you to utilize prompt flow's powerful evaluation and experiment management to assess the quality of your Semantic Kernel plugins and planners comprehensively.
23
+
- It allows you to harness the powerful AI orchestration capabilities of Semantic Kernel to enhance the efficiency and effectiveness of your prompt flow.
24
+
- More importantly, it enables you to utilize prompt flow's powerful evaluation and experiment management to assess the quality of your Semantic Kernel plugins and planners comprehensively.
26
25
27
26
## What is Semantic Kernel?
28
27
@@ -34,7 +33,7 @@ As you build plugins and add them to planners, it’s important to make sure the
34
33
35
34
Previously, testing plugins and planners was a manual, time-consuming process. Until now, you can automate this with prompt flow.
36
35
37
-
In our comprehensive updated documentation, we provide guidance step by step:
36
+
In this section we will:
38
37
1. Create a flow with Semantic Kernel.
39
38
1. Executing batch tests.
40
39
1. Conducting evaluations to quantitatively ascertain the accuracy of your planners and plugins.
@@ -47,10 +46,9 @@ Similar to the integration of Langchain with prompt flow, Semantic Kernel, which
47
46
48
47
#### Prerequisites: Set up compute session and connection
49
48
50
-
> [!IMPORTANT]
51
-
> Prior to developing the flow, it's essential to install the [Semantic Kernel package](/semantic-kernel/get-started/quick-start-guide/?toc=%2Fsemantic-kernel%2Ftoc.json&tabs=python) in your requirements.txt for executor.
49
+
Prior to developing the flow, it's essential to install the [Semantic Kernel package](/semantic-kernel/get-started/quick-start-guide/?toc=%2Fsemantic-kernel%2Ftoc.json&tabs=python) in your requirements.txt for executor.
52
50
53
-
To learn more, see [How to manage compute session](./how-to-manage-compute-session.md) for guidance.
51
+
To learn more about compute session, see [How to manage compute session](./how-to-manage-compute-session.md) for guidance.
54
52
55
53
> [!IMPORTANT]
56
54
> The approach to consume OpenAI or Azure OpenAI in Semantic Kernel is to obtain the keys you have specified in environment variables or stored in a `.env` file.
@@ -61,19 +59,20 @@ In prompt flow, you need to use **Connection** to store the keys. You can conver
61
59
62
60
You can then utilize this custom connection to invoke your OpenAI or Azure OpenAI model within the flow.
63
61
64
-
65
62
#### Create and develop a flow
63
+
66
64
Once the setup is complete, you can conveniently convert your existing Semantic Kernel planner to a prompt flow by following the steps below:
65
+
67
66
1. Create a standard flow.
68
67
1. Select the *+ Python* icon to create a new Python node.
69
-
1. Name it as your planner name (e.g., *math_planner*).
68
+
1. Name it as your planner name (for example, *math_planner*).
70
69
1. Select **+** button in *Files* tab to upload any other reference files (for example, *plugins*).
71
70
1. Update the code in *__.py* file with your planner's code.
72
71
1. Define the input and output of the planner node.
73
72
1. Set the flow input and output.
74
-
1.Click*Run* for a single test.
73
+
1.Select*Run* for a single test.
75
74
76
-
For example, we can create a flow with a Semantic Kernel planner that solves math problems. Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/create-a-prompt-flow-with-semantic-kernel) with steps necessary to create a simple prompt flow with Semantic Kernel at its core.
75
+
For our example, we are creating a flow with a Semantic Kernel planner that solves math problems.
77
76
78
77
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png" alt-text="Screenshot of creating a flow with semantic kernel planner." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png":::
79
78
@@ -91,13 +90,14 @@ Instead of manually testing different scenarios one-by-one, now you can now auto
91
90
92
91
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png" alt-text="Screenshot of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png":::
93
92
94
-
Once the flow has passed the single test run in the previous step, you can effortlessly create a batch test in prompt flow by adhering to the following steps:
93
+
Once the flow has passed the single test run in the previous step, you can create a batch test in prompt flow by adhering to the following steps:
94
+
95
95
1. Create benchmark data in a *jsonl* file, contains a list of JSON objects that contains the input and the correct ground truth.
96
-
1.Click*Batch run* to create a batch test.
96
+
1.Select*Batch run* to create a batch test.
97
97
1. Complete the batch run settings, especially the data part.
98
98
1. Submit run without evaluation (for this specific batch test, the *Evaluation step* can be skipped).
99
99
100
-
In our [Running batches with prompt flow](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/running-batches-with-prompt-flow?tabs=gpt-35-turbo), we demonstrate how you can use this functionality to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
100
+
You can use batches with prompt flow to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
101
101
102
102
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png" alt-text="Screenshot of data of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png":::
103
103
@@ -123,39 +123,35 @@ There's also the flexibility to develop **your own custom evaluators** if needed
Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo) for Semantic Kernel to learn more about how to use the [math accuracy evaluation flow](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-accuracy-maths-to-code) to test our planner to see how well it solves word problems.
140
-
141
138
After running the evaluator, you’ll get a summary back of your metrics. Initial runs may yield less than ideal results, which can be used as a motivation for immediate improvement.
142
139
143
-
To check the metrics, you can go back to the batch run detail page, click **Details** button, and then click **Output** tab, select the evaluation run name in the dropdown list to view the evaluation result.
140
+
To check the metrics, you can go back to the batch run detail page, select the **Details** button, then select the **Output** tab, and select the evaluation run name in the dropdown list to view the evaluation result.
If you find that your plugins and planners aren’t performing as well as they should, there are steps you can take to make them better. In this documentation, we provide an in-depth guide on practical strategies to bolster the effectiveness of your plugins and planners. We recommend the following high-level considerations:
150
+
If you find that your plugins and planners aren’t performing as well as they should, there are steps you can take to make them better. We recommend the following high-level considerations to bolster the effectiveness of your plugins and planners.
155
151
156
152
1. Use a more advanced model like GPT-4 instead of GPT-3.5-turbo.
157
-
1.[Improve the description of your plugins](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo#improving-the-descriptions-of-your-plugin) so they’re easier for the planner to use.
158
-
1.[Inject additional help to the planner](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo#improving-the-descriptions-of-your-plugin) when sending the user’s ask.
153
+
1. Improve the description of your plugins so they’re easier for the planner to use.
154
+
1. Inject additional help to the planner when sending the user’s ask.
159
155
160
156
By doing a combination of these three things, we demonstrate how you can take a failing planner and turn it into a winning one! At the end of the walkthrough, you should have a planner that can correctly answer all of the benchmark data.
161
157
@@ -175,10 +171,6 @@ This will present you with a detailed table, line-by-line comparison of the resu
175
171
176
172
## Next steps
177
173
178
-
> [!TIP]
179
-
> Follow along with our documentations to get started!
180
-
> And keep an eye out for more integrations.
181
-
182
174
If you're interested in learning more about how you can use Planners in Semantic Kernel, we recommend that you read the following article:
183
175
184
176
*[Learn more about planners](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/)
0 commit comments