Merge pull request #285056 from lgayhardt/patch-283

prmerger-automator[bot] · web-flow · commit 80691485d008 · 2024-08-20T12:09:53.000Z
Update how-to-evaluate-semantic-kernel.md
diff --git a/articles/machine-learning/prompt-flow/how-to-evaluate-semantic-kernel.md b/articles/machine-learning/prompt-flow/how-to-evaluate-semantic-kernel.md
@@ -19,10 +19,9 @@ ms.date: 09/15/2023
 
 In the rapidly evolving landscape of AI orchestration, a comprehensive evaluation of your plugins and planners is paramount for optimal performance. This article introduces how to evaluate your **Semantic Kernel** [plugins](/semantic-kernel/ai-orchestration/plugins) and [planners](/semantic-kernel/ai-orchestration/planners) with prompt flow. Furthermore, you can learn the seamless integration story between prompt flow and Semantic Kernel.
 
-
 The integration of Semantic Kernel with prompt flow is a significant milestone. 
-* It allows you to harness the powerful AI orchestration capabilities of Semantic Kernel to enhance the efficiency and effectiveness of your prompt flow. 
-* More importantly, it enables you to utilize prompt flow's powerful evaluation and experiment management to assess the quality of your Semantic Kernel plugins and planners comprehensively.
+- It allows you to harness the powerful AI orchestration capabilities of Semantic Kernel to enhance the efficiency and effectiveness of your prompt flow. 
+- More importantly, it enables you to utilize prompt flow's powerful evaluation and experiment management to assess the quality of your Semantic Kernel plugins and planners comprehensively.
 
 ## What is Semantic Kernel?
 
@@ -34,7 +33,7 @@ As you build plugins and add them to planners, it’s important to make sure the
 
 Previously, testing plugins and planners was a manual, time-consuming process. Until now, you can automate this with prompt flow.
 
-In our comprehensive updated documentation, we provide guidance step by step:
+In this section we will:
 1. Create a flow with Semantic Kernel.
 1. Executing batch tests.
 1. Conducting evaluations to quantitatively ascertain the accuracy of your planners and plugins.
@@ -47,10 +46,9 @@ Similar to the integration of Langchain with prompt flow, Semantic Kernel, which
 
 #### Prerequisites: Set up compute session and connection
 
-> [!IMPORTANT]
-> Prior to developing the flow, it's essential to install the [Semantic Kernel package](/semantic-kernel/get-started/quick-start-guide/?toc=%2Fsemantic-kernel%2Ftoc.json&tabs=python) in your requirements.txt for executor. 
+Prior to developing the flow, it's essential to install the [Semantic Kernel package](/semantic-kernel/get-started/quick-start-guide/?toc=%2Fsemantic-kernel%2Ftoc.json&tabs=python) in your requirements.txt for executor. 
 
-To learn more, see [How to manage compute session](./how-to-manage-compute-session.md) for guidance.
+To learn more about compute session, see [How to manage compute session](./how-to-manage-compute-session.md) for guidance.
 
 > [!IMPORTANT]
 > The approach to consume OpenAI or Azure OpenAI in Semantic Kernel is to obtain the keys you have specified in environment variables or stored in a `.env` file.
@@ -61,19 +59,20 @@ In prompt flow, you need to use **Connection** to store the keys. You can conver
 
 You can then utilize this custom connection to invoke your OpenAI or Azure OpenAI model within the flow.
 
-
 #### Create and develop a flow
+
 Once the setup is complete, you can conveniently convert your existing Semantic Kernel planner to a prompt flow by following the steps below:
+
 1. Create a standard flow.
 1. Select the *+ Python* icon to create a new Python node.
-1. Name it as your planner name (e.g., *math_planner*).
+1. Name it as your planner name (for example, *math_planner*).
 1. Select **+** button in *Files* tab to upload any other reference files (for example, *plugins*).
 1. Update the code in *__.py* file with your planner's code.
 1. Define the input and output of the planner node.
 1. Set the flow input and output.
-1. Click *Run* for a single test.
+1. Select *Run* for a single test.
 
-For example, we can create a flow with a Semantic Kernel planner that solves math problems. Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/create-a-prompt-flow-with-semantic-kernel) with steps necessary to create a simple prompt flow with Semantic Kernel at its core.
+For our example, we are creating a flow with a Semantic Kernel planner that solves math problems.
 
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png" alt-text="Screenshot of creating a flow with semantic kernel planner." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-flow.png":::
 
@@ -91,13 +90,14 @@ Instead of manually testing different scenarios one-by-one, now you can now auto
 
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png" alt-text="Screenshot of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/using-batch-runs-with-prompt-flow.png":::
 
-Once the flow has passed the single test run in the previous step, you can effortlessly create a batch test in prompt flow by adhering to the following steps:
+Once the flow has passed the single test run in the previous step, you can create a batch test in prompt flow by adhering to the following steps:
+
 1. Create benchmark data in a *jsonl* file, contains a list of JSON objects that contains the input and the correct ground truth.
-1. Click *Batch run* to create a batch test.
+1. Select *Batch run* to create a batch test.
 1. Complete the batch run settings, especially the data part.
 1. Submit run without evaluation (for this specific batch test, the *Evaluation step* can be skipped).
 
-In our [Running batches with prompt flow](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/running-batches-with-prompt-flow?tabs=gpt-35-turbo), we demonstrate how you can use this functionality to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
+You can use batches with prompt flow to run batch tests on a planner that uses a math plugin. By defining a bunch of word problems, we can quickly test any changes we make to our plugins or planners so we can catch regressions early and often.
 
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png" alt-text="Screenshot of data of batch runs with prompt flow for Semantic kernel." lightbox = "./media/how-to-evaluate-semantic-kernel/semantic-kernel-test-data.png":::
 
@@ -123,39 +123,35 @@ There's also the flexibility to develop **your own custom evaluators** if needed
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/my-evaluator.png" alt-text="My custom evaluation flow" lightbox = "./media/how-to-evaluate-semantic-kernel/my-evaluator.png":::
 
 In prompt flow, you can quick create an evaluation run based on a completed batch run by following the steps below:
+
 1. Prepare the evaluation flow and the complete a batch run.
-1. Click *Run* tab in home page to go to the run list.
+1. Select the *Run* tab on the home page to go to the run list.
 1. Go into the previous completed batch run.
-1. Click *Evaluate* in the above to create an evaluation run.
+1. Select *Evaluate* to create an evaluation run.
 1. Complete the evaluation settings, especially the evaluation flow and the input mapping.
 1. Submit run and wait for the result.
 
-
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/add-evaluation.png" alt-text="Screenshot showing add new evaluation." lightbox = "./media/how-to-evaluate-semantic-kernel/add-evaluation.png":::
 
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-setting.png" alt-text="Screenshot showing evaluation settings." lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-setting.png":::
 
-
-Follow this [documentation](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo) for Semantic Kernel to learn more about how to use the [math accuracy evaluation flow](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-accuracy-maths-to-code) to test our planner to see how well it solves word problems. 
-
 After running the evaluator, you’ll get a summary back of your metrics. Initial runs may yield less than ideal results, which can be used as a motivation for immediate improvement. 
 
-To check the metrics, you can go back to the batch run detail page, click **Details** button, and then click **Output** tab, select the evaluation run  name in the dropdown list to view the evaluation result.
+To check the metrics, you can go back to the batch run detail page, select the **Details** button, then select the **Output** tab, and select the evaluation run  name in the dropdown list to view the evaluation result.
 
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-result.png" alt-text="Screenshot showing evaluation result." lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-result.png":::
 
 You can check the aggregated metric in the **Metrics** tab.
 
 :::image type="content" source="./media/how-to-evaluate-semantic-kernel/evaluation-metrics.png" alt-text="Screenshot showing evaluation metrics." lightbox = "./media/how-to-evaluate-semantic-kernel/evaluation-metrics.png":::
 
-
 ### Experiments for quality improvement
 
-If you find that your plugins and planners aren’t performing as well as they should, there are steps you can take to make them better. In this documentation, we provide an in-depth guide on practical strategies to bolster the effectiveness of your plugins and planners. We recommend the following high-level considerations:
+If you find that your plugins and planners aren’t performing as well as they should, there are steps you can take to make them better. We recommend the following high-level considerations to bolster the effectiveness of your plugins and planners.
 
 1. Use a more advanced model like GPT-4 instead of GPT-3.5-turbo.
-1. [Improve the description of your plugins](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo#improving-the-descriptions-of-your-plugin) so they’re easier for the planner to use.
-1. [Inject additional help to the planner](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/evaluating-plugins-and-planners-with-prompt-flow?tabs=gpt-35-turbo#improving-the-descriptions-of-your-plugin) when sending the user’s ask.
+1. Improve the description of your plugins so they’re easier for the planner to use.
+1. Inject additional help to the planner when sending the user’s ask.
 
 By doing a combination of these three things, we demonstrate how you can take a failing planner and turn it into a winning one! At the end of the walkthrough, you should have a planner that can correctly answer all of the benchmark data.
 
@@ -175,10 +171,6 @@ This will present you with a detailed table, line-by-line comparison of the resu
 
 ## Next steps
 
-> [!TIP]
-> Follow along with our documentations to get started!
-> And keep an eye out for more integrations.
-
 If you're interested in learning more about how you can use Planners in Semantic Kernel, we recommend that you read the following article:
 
 * [Learn more about planners](/semantic-kernel/ai-orchestration/planners/evaluate-and-deploy-planners/)