Skip to content

Commit 05cc80d

Browse files
authored
Merge pull request #5649 from LHL407/articles-about-evaluating-apps-with-Azure-AI-Evaluation-SDK
[AQ] edit pass: Articles about evaluating apps with the Azure AI Evaluation SDK
2 parents 97e2740 + e8a988d commit 05cc80d

8 files changed

+592
-575
lines changed

articles/ai-foundry/how-to/develop/agent-evaluate-sdk.md

Lines changed: 98 additions & 91 deletions
Large diffs are not rendered by default.

articles/ai-foundry/how-to/develop/cloud-evaluation.md

Lines changed: 72 additions & 73 deletions
Large diffs are not rendered by default.

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 93 additions & 95 deletions
Large diffs are not rendered by default.

articles/ai-foundry/how-to/develop/simulator-interaction-data.md

Lines changed: 115 additions & 107 deletions
Large diffs are not rendered by default.

articles/ai-foundry/how-to/evaluate-generative-ai-app.md

Lines changed: 91 additions & 87 deletions
Large diffs are not rendered by default.
Lines changed: 36 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: How to manually evaluate prompts in Azure AI Foundry portal playground
2+
title: Manually Evaluate Prompts in the Azure AI Foundry Portal Playground
33
titleSuffix: Azure AI Foundry
4-
description: Quickly test and evaluate prompts in Azure AI Foundry portal playground.
4+
description: Learn how to quickly test and evaluate prompts in the Azure AI Foundry portal playground.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom:
@@ -14,73 +14,70 @@ ms.author: lagayhar
1414
author: lgayhardt
1515
---
1616

17-
# Manually evaluate prompts in Azure AI Foundry portal playground
17+
# Manually evaluate prompts in the Azure AI Foundry portal playground
1818

1919
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
2020

21-
When you get started with prompt engineering, you should test different inputs one at a time to evaluate the effectiveness of the prompt can be very time intensive. This is because it's important to check whether the content filters are working appropriately, whether the response is accurate, and more.
21+
As you learn prompt engineering, you should test different prompts (*inputs*) one at a time to evaluate their effectiveness. This process can be very time intensive for several reasons. You need to check to make sure the content filters work appropriately, the response is accurate, and more.
2222

23-
To make this process simpler, you can utilize manual evaluation in Azure AI Foundry portal, an evaluation tool enabling you to continuously iterate and evaluate your prompt against your test data in a single interface. You can also manually rate the outputs, the model's responses, to help you gain confidence in your prompt.
23+
To simplify this process, you can utilize manual evaluation in Azure AI Foundry portal. This evaluation tool enables you to use a single interface to continuously iterate and evaluate your prompt against your test data. You can also manually rate the model's responses (*outputs*) to help you gain confidence in your prompt.
2424

25-
Manual evaluation can help you get started to understand how well your prompt is performing and iterate on your prompt to ensure you reach your desired level of confidence.
25+
Manual evaluation can help you understand how your prompt is performing. You can then iterate on your prompt to ensure you reach your desired level of confidence.
2626

27-
In this article you learn to:
28-
* Generate your manual evaluation results
29-
* Rate your model responses
30-
* Iterate on your prompt and reevaluate
31-
* Save and compare results
32-
* Evaluate with built-in metrics
27+
In this article, you learn to:
3328

34-
## Prerequisites
29+
* Generate your manual evaluation results.
30+
* Rate your model responses.
31+
* Iterate on your prompt and reevaluate.
32+
* Save and compare results.
33+
* Evaluate with built-in metrics.
3534

36-
To generate manual evaluation results, you need to have the following ready:
35+
## Prerequisites
3736

38-
* A test dataset in one of these formats: csv or jsonl. If you don't have a dataset available, we also allow you to input data manually from the UI.
39-
40-
* A deployment of one of these models: GPT 3.5 models, GPT 4 models, or Davinci models. To learn more about how to create a deployment, see [Deploy models](./deploy-models-openai.md).
37+
* A test dataset in one of these formats: CSV or JSON Lines (JSONL). If you don't have a dataset available, you can also manually enter data from the UI.
38+
* A deployment of one of these models: GPT-3.5, GPT-4, or Davinci. To learn more about how to create a deployment, see [Deploy models](./deploy-models-openai.md).
4139

4240
> [!NOTE]
43-
> Manual evaluation is only supported for Azure OpenAI models at this time for chat and completion task types.
44-
45-
## Generate your manual evaluation results
41+
> At this time, manual evaluation is only supported for Azure OpenAI models for chat and completion task types.
4642
47-
From the **Playground**, select **Manual evaluation** to begin the process of manually reviewing the model responses based on your test data and prompt. Your prompt is automatically transitioned to your **Manual evaluation** and now you just need to add test data to evaluate the prompt against.
43+
## Generate your manual evaluation results
4844

49-
This can be done manually using the text boxes in the **Input** column.
45+
From **Playground**, select the **Manual evaluation** option to begin the process of manually reviewing the model responses based on your test data and prompt. Your prompt is automatically transitioned to your **Manual evaluation** file. You need to add test data to evaluate the prompt against. You can do this step manually by using the text boxes in the **Input** column.
5046

51-
You can also **Import Data** to choose one of your previous existing datasets in your project or upload a dataset that is in CSV or JSONL format. After loading your data, you'll be prompted to map the columns appropriately. Once you finish and select **Import**, the data is populated appropriately in the columns below.
47+
You can also use the **Import Data** feature to select one of the existing datasets in your project, or upload a dataset in CSV or JSONL format. After you load your data, you're prompted to map the columns appropriately. After you finish and select **Import**, the data is populated in the appropriate columns.
5248

53-
:::image type="content" source="../media/evaluations/prompts/generate-manual-evaluation-results.png" alt-text="Screenshot of generating manual evaluation results." lightbox= "../media/evaluations/prompts/generate-manual-evaluation-results.png":::
49+
:::image type="content" source="../media/evaluations/prompts/generate-manual-evaluation-results.png" alt-text="Screenshot that shows how to generate manual evaluation results." lightbox= "../media/evaluations/prompts/generate-manual-evaluation-results.png":::
5450

5551
> [!NOTE]
56-
> You can add as many as 50 input rows to your manual evaluation. If your test data has more than 50 input rows, we will upload the first 50 in the input column.
52+
> You can add as many as 50 input rows to your manual evaluation. If your test data has more than 50 input rows, only the first 50 upload to the input column.
5753
58-
Now that your data is added, you can **Run** to populate the output column with the model's response.
54+
Now that your data is added, you can select **Run** to populate the output column with the model's response.
5955

60-
## Rate your model responses
56+
## Rate your model's responses
6157

62-
You can provide a thumb up or down rating to each response to assess the prompt output. Based on the ratings you provided, you can view these response scores in the at-a-glance summaries.
58+
You can rate the prompt's output by selecting a thumbs up or down for each response. Based on the ratings that you provide, you can view these response scores in the at-a-glance summaries.
6359

64-
:::image type="content" source="../media/evaluations/prompts/rate-results.png" alt-text="Screenshot of response scores in the at-a-glance summaries." lightbox= "../media/evaluations/prompts/rate-results.png":::
60+
:::image type="content" source="../media/evaluations/prompts/rate-results.png" alt-text="Screenshot that shows response scores in the at-a-glance summaries." lightbox= "../media/evaluations/prompts/rate-results.png":::
6561

66-
## Iterate on your prompt and reevaluate
62+
## Iterate on your prompt and reevaluate
6763

68-
Based on your summary, you might want to make changes to your prompt. You can use the prompt controls above to edit your prompt setup. This can be updating the system message, changing the model, or editing the parameters.
64+
Based on your summary, you might want to make changes to your prompt. You can edit your prompt setup by using the prompt controls mentioned previously. You can update the system message, change the model, edit the parameters, and more.
6965

70-
After making your edits, you can choose to rerun all to update the entire table or focus on rerunning specific rows that didn't meet your expectations the first time.
66+
After you make your edits, you can run them all again to update the entire table or run only specific rows again that didn't meet your expectations the first time.
7167

72-
## Save and compare results
68+
## Save and compare results
7369

74-
After populating your results, you can **Save results** to share progress with your team or to continue your manual evaluation from where you left off later.
70+
After you populate your results, you can select **Save results**. By saving your results, you can share the progress with your team or continue your manual evaluation later.
7571

76-
:::image type="content" source="../media/evaluations/prompts/save-and-compare-results.png" alt-text="Screenshot of the save results." lightbox= "../media/evaluations/prompts/save-and-compare-results.png":::
72+
:::image type="content" source="../media/evaluations/prompts/save-and-compare-results.png" alt-text="Screenshot of the Save results selection." lightbox= "../media/evaluations/prompts/save-and-compare-results.png":::
7773

78-
You can also compare the thumbs up and down ratings across your different manual evaluations by saving them and viewing them in the Evaluation tab under Manual evaluation.
74+
You can also compare the thumbs up and down ratings across your manual evaluations. Save them, and then view them on the **Evaluation** tab under **Manual evaluation**.
7975

80-
## Next steps
76+
## Related content
8177

8278
Learn more about how to evaluate your generative AI applications:
83-
- [Evaluate your generative AI apps with the Azure AI Foundry portal or SDK](./evaluate-generative-ai-app.md)
84-
- [View the evaluation results](./evaluate-results.md)
79+
80+
* [Evaluate your generative AI apps with the Azure AI Foundry portal or SDK](./evaluate-generative-ai-app.md)
81+
* [View the evaluation results](./evaluate-results.md)
8582

8683
Learn more about [harm mitigation techniques](../concepts/evaluation-approach-gen-ai.md).

0 commit comments

Comments
 (0)