Skip to content

Commit 8069148

Browse files
Merge pull request #285056 from lgayhardt/patch-283
Update how-to-evaluate-semantic-kernel.md
2 parents 42b2a06 + a51a847 commit 8069148

File tree

1 file changed

+21
-29
lines changed

1 file changed

+21
-29
lines changed

articles/machine-learning/prompt-flow/how-to-evaluate-semantic-kernel.md

Lines changed: 21 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,9 @@ ms.date: 09/15/2023
1919

2020
In the rapidly evolving landscape of AI orchestration, a comprehensive evaluation of your plugins and planners is paramount for optimal performance. This article introduces how to evaluate your **Semantic Kernel** [plugins](/semantic-kernel/ai-orchestration/plugins) and [planners](/semantic-kernel/ai-orchestration/planners) with prompt flow. Furthermore, you can learn the seamless integration story between prompt flow and Semantic Kernel.
2121

22-
2322
The integration of Semantic Kernel with prompt flow is a significant milestone.
24-
* It allows you to harness the powerful AI orchestration capabilities of Semantic Kernel to enhance the efficiency and effectiveness of your prompt flow.
25-
* More importantly, it enables you to utilize prompt flow's powerful evaluation and experiment management to assess the quality of your Semantic Kernel plugins and planners comprehensively.
23+
- It allows you to harness the powerful AI orchestration capabilities of Semantic Kernel to enhance the efficiency and effectiveness of your prompt flow.
24+
- More importantly, it enables you to utilize prompt flow's powerful evaluation and experiment management to assess the quality of your Semantic Kernel plugins and planners comprehensively.
2625

2726
## What is Semantic Kernel?
2827

@@ -34,7 +33,7 @@ As you build plugins and add them to planners, it’s important to make sure the
3433

3534
Previously, testing plugins and planners was a manual, time-consuming process. Until now, you can automate this with prompt flow.
3635

37-
In our comprehensive updated documentation, we provide guidance step by step:
36+
In this section we will:
3837
1. Create a flow with Semantic Kernel.
3938
1. Executing batch tests.
4039
1. Conducting evaluations to quantitatively ascertain the accuracy of your planners and plugins.
@@ -47,10 +46,9 @@ Similar to the integration of Langchain with prompt flow, Semantic Kernel, which
4746

4847
#### Prerequisites: Set up compute session and connection
4948

50-
> [!IMPORTANT]
51-
> Prior to developing the flow, it's essential to install the [Semantic Kernel package](/semantic-kernel/get-started/quick-start-guide/?toc=%2Fsemantic-kernel%2Ftoc.json&tabs=python) in your requirements.txt for executor.
49+
Prior to developing the flow, it's essential to install the [Semantic Kernel package](/semantic-kernel/get-started/quick-start-guide/?toc=%2Fsemantic-kernel%2Ftoc.json&tabs=python) in your requirements.txt for executor.
5250

53-
To learn more, see [How to manage compute session](./how-to-manage-compute-session.md) for guidance.
51+
To learn more about compute session, see [How to manage compute session](./how-to-manage-compute-session.md) for guidance.
5452

5553
> [!IMPORTANT]
5654
> The approach to consume OpenAI or Azure OpenAI in Semantic Kernel is to obtain the keys you have specified in environment variables or stored in a `.env` file.
@@ -61,19 +59,20 @@ In prompt flow, you need to use **Connection** to store the keys. You can conver
6159

6260
You can then utilize this custom connection to invoke your OpenAI or Azure OpenAI model within the flow.
6361

64-
6562
#### Create and develop a flow
63+
6664
Once the setup is complete, you can conveniently convert your existing Semantic Kernel planner to a prompt flow by following the steps below:
65+
6766
1. Create a standard flow.
6867
1. Select the *+ Python* icon to create a new Python node.
69-
1. Name it as your planner name (e.g., *math_planner*).
68+
1. Name it as your planner name (for example, *math_planner*).
7069
1. Select **+** button in *Files* tab to upload any other reference files (for example, *plugins*).
7170
1. Update the code in *__.py* file with your planner's code.
7271
1. Define the input and output of the planner node.
7372
1. Set the flow input and output.
74-
1. Click *Run* for a single test.
73+
1. Select *Run* for a single test.
7574

76-
For example, we can create a flow with a Semantic Kernel planner that solves math problems. Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/create-a-prompt-flow-with-semantic-kernel) with steps necessary to create a simple prompt flow with Semantic Kernel at its core.
75+
For our example, we are creating a flow with a Semantic Kernel planner that solves math problems.
7776

7877
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png" alt-text="Screenshot of creating a flow with semantic kernel planner." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png":::
7978

@@ -91,13 +90,14 @@ Instead of manually testing different scenarios one-by-one, now you can now auto
9190

9291
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png" alt-text="Screenshot of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png":::
9392

94-
Once the flow has passed the single test run in the previous step, you can effortlessly create a batch test in prompt flow by adhering to the following steps:
93+
Once the flow has passed the single test run in the previous step, you can create a batch test in prompt flow by adhering to the following steps:
94+
9595
1. Create benchmark data in a *jsonl* file, contains a list of JSON objects that contains the input and the correct ground truth.
96-
1. Click *Batch run* to create a batch test.
96+
1. Select *Batch run* to create a batch test.
9797
1. Complete the batch run settings, especially the data part.
9898
1. Submit run without evaluation (for this specific batch test, the *Evaluation step* can be skipped).
9999

100-
In our [Running batches with prompt flow](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/running-batches-with-prompt-flow?tabs=gpt-35-turbo), we demonstrate how you can use this functionality to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
100+
You can use batches with prompt flow to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
101101

102102
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png" alt-text="Screenshot of data of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png":::
103103

@@ -123,39 +123,35 @@ There's also the flexibility to develop **your own custom evaluators** if needed
123123
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/my-evaluator.png" alt-text="My custom evaluation flow" lightbox = "./media/how-to-evaluate-semantic-kernel/my-evaluator.png":::
124124

125125
In prompt flow, you can quick create an evaluation run based on a completed batch run by following the steps below:
126+
126127
1. Prepare the evaluation flow and the complete a batch run.
127-
1. Click *Run* tab in home page to go to the run list.
128+
1. Select the *Run* tab on the home page to go to the run list.
128129
1. Go into the previous completed batch run.
129-
1. Click *Evaluate* in the above to create an evaluation run.
130+
1. Select *Evaluate* to create an evaluation run.
130131
1. Complete the evaluation settings, especially the evaluation flow and the input mapping.
131132
1. Submit run and wait for the result.
132133

133-
134134
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/add-evaluation.png" alt-text="Screenshot showing add new evaluation." lightbox = "./media/how-to-evaluate-semantic-kernel/add-evaluation.png":::
135135

136136
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-setting.png" alt-text="Screenshot showing evaluation settings." lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-setting.png":::
137137

138-
139-
Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo) for Semantic Kernel to learn more about how to use the [math accuracy evaluation flow](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-accuracy-maths-to-code) to test our planner to see how well it solves word problems.
140-
141138
After running the evaluator, you’ll get a summary back of your metrics. Initial runs may yield less than ideal results, which can be used as a motivation for immediate improvement.
142139

143-
To check the metrics, you can go back to the batch run detail page, click **Details** button, and then click **Output** tab, select the evaluation run name in the dropdown list to view the evaluation result.
140+
To check the metrics, you can go back to the batch run detail page, select the **Details** button, then select the **Output** tab, and select the evaluation run name in the dropdown list to view the evaluation result.
144141

145142
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-result.png" alt-text="Screenshot showing evaluation result." lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-result.png":::
146143

147144
You can check the aggregated metric in the **Metrics** tab.
148145

149146
:::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-metrics.png" alt-text="Screenshot showing evaluation metrics." lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-metrics.png":::
150147

151-
152148
### Experiments for quality improvement
153149

154-
If you find that your plugins and planners aren’t performing as well as they should, there are steps you can take to make them better. In this documentation, we provide an in-depth guide on practical strategies to bolster the effectiveness of your plugins and planners. We recommend the following high-level considerations:
150+
If you find that your plugins and planners aren’t performing as well as they should, there are steps you can take to make them better. We recommend the following high-level considerations to bolster the effectiveness of your plugins and planners.
155151

156152
1. Use a more advanced model like GPT-4 instead of GPT-3.5-turbo.
157-
1. [Improve the description of your plugins](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo#improving-the-descriptions-of-your-plugin) so they’re easier for the planner to use.
158-
1. [Inject additional help to the planner](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo#improving-the-descriptions-of-your-plugin) when sending the user’s ask.
153+
1. Improve the description of your plugins so they’re easier for the planner to use.
154+
1. Inject additional help to the planner when sending the user’s ask.
159155

160156
By doing a combination of these three things, we demonstrate how you can take a failing planner and turn it into a winning one! At the end of the walkthrough, you should have a planner that can correctly answer all of the benchmark data.
161157

@@ -175,10 +171,6 @@ This will present you with a detailed table, line-by-line comparison of the resu
175171

176172
## Next steps
177173

178-
> [!TIP]
179-
> Follow along with our documentations to get started!
180-
> And keep an eye out for more integrations.
181-
182174
If you're interested in learning more about how you can use Planners in Semantic Kernel, we recommend that you read the following article:
183175

184176
* [Learn more about planners](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/)

0 commit comments

Comments
 (0)