You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you get started with prompt engineering, you should test different inputs one at a time to evaluate the effectiveness of the prompt can be very time intensive. This is because it's important to check whether the content filters are working appropriately, whether the response is accurate, and more.
21
+
As you learn prompt engineering, you should test different prompts (*inputs*) one at a time to evaluate their effectiveness. This process can be very time intensive for several reasons. You need to check to make sure the content filters work appropriately, the response is accurate, and more.
22
22
23
-
To make this process simpler, you can utilize manual evaluation in Azure AI Foundry portal, an evaluation tool enabling you to continuously iterate and evaluate your prompt against your test data in a single interface. You can also manually rate the outputs, the model's responses, to help you gain confidence in your prompt.
23
+
To simplify this process, you can utilize manual evaluation in Azure AI Foundry portal. This evaluation tool enables you to use a single interface to continuously iterate and evaluate your prompt against your test data. You can also manually rate the model's responses (*outputs*) to help you gain confidence in your prompt.
24
24
25
-
Manual evaluation can help you get started to understand how well your prompt is performing and iterate on your prompt to ensure you reach your desired level of confidence.
25
+
Manual evaluation can help you understand how your prompt is performing. You can then iterate on your prompt to ensure you reach your desired level of confidence.
26
26
27
-
In this article you learn to:
28
-
* Generate your manual evaluation results
29
-
* Rate your model responses
30
-
* Iterate on your prompt and reevaluate
31
-
* Save and compare results
32
-
* Evaluate with built-in metrics
27
+
In this article, you learn to:
33
28
34
-
## Prerequisites
29
+
* Generate your manual evaluation results.
30
+
* Rate your model responses.
31
+
* Iterate on your prompt and reevaluate.
32
+
* Save and compare results.
33
+
* Evaluate with built-in metrics.
35
34
36
-
To generate manual evaluation results, you need to have the following ready:
35
+
## Prerequisites
37
36
38
-
* A test dataset in one of these formats: csv or jsonl. If you don't have a dataset available, we also allow you to input data manually from the UI.
39
-
40
-
* A deployment of one of these models: GPT 3.5 models, GPT 4 models, or Davinci models. To learn more about how to create a deployment, see [Deploy models](./deploy-models-openai.md).
37
+
* A test dataset in one of these formats: CSV or JSON Lines (JSONL). If you don't have a dataset available, you can also manually enter data from the UI.
38
+
* A deployment of one of these models: GPT-3.5, GPT-4, or Davinci. To learn more about how to create a deployment, see [Deploy models](./deploy-models-openai.md).
41
39
42
40
> [!NOTE]
43
-
> Manual evaluation is only supported for Azure OpenAI models at this time for chat and completion task types.
44
-
45
-
## Generate your manual evaluation results
41
+
> At this time, manual evaluation is only supported for Azure OpenAI models for chat and completion task types.
46
42
47
-
From the **Playground**, select **Manual evaluation** to begin the process of manually reviewing the model responses based on your test data and prompt. Your prompt is automatically transitioned to your **Manual evaluation** and now you just need to add test data to evaluate the prompt against.
43
+
## Generate your manual evaluation results
48
44
49
-
This can be done manually using the text boxes in the **Input** column.
45
+
From **Playground**, select the **Manual evaluation** option to begin the process of manually reviewing the model responses based on your test data and prompt. Your prompt is automatically transitioned to your **Manual evaluation** file. You need to add test data to evaluate the prompt against. You can do this step manually by using the text boxes in the **Input** column.
50
46
51
-
You can also **Import Data** to choose one of your previous existing datasets in your project or upload a dataset that is in CSV or JSONL format. After loading your data, you'll be prompted to map the columns appropriately. Once you finish and select **Import**, the data is populated appropriately in the columns below.
47
+
You can also use the **Import Data**feature to select one of the existing datasets in your project, or upload a dataset in CSV or JSONL format. After you load your data, you're prompted to map the columns appropriately. After you finish and select **Import**, the data is populated in the appropriate columns.
:::image type="content" source="../media/evaluations/prompts/generate-manual-evaluation-results.png" alt-text="Screenshot that shows how to generate manual evaluation results." lightbox= "../media/evaluations/prompts/generate-manual-evaluation-results.png":::
54
50
55
51
> [!NOTE]
56
-
> You can add as many as 50 input rows to your manual evaluation. If your test data has more than 50 input rows, we will upload the first 50 in the input column.
52
+
> You can add as many as 50 input rows to your manual evaluation. If your test data has more than 50 input rows, only the first 50 upload to the input column.
57
53
58
-
Now that your data is added, you can **Run** to populate the output column with the model's response.
54
+
Now that your data is added, you can select **Run** to populate the output column with the model's response.
59
55
60
-
## Rate your model responses
56
+
## Rate your model's responses
61
57
62
-
You can provide a thumb up or down rating to each response to assess the prompt output. Based on the ratings you provided, you can view these response scores in the at-a-glance summaries.
58
+
You can rate the prompt's output by selecting a thumbs up or down for each response. Based on the ratings that you provide, you can view these response scores in the at-a-glance summaries.
63
59
64
-
:::image type="content" source="../media/evaluations/prompts/rate-results.png" alt-text="Screenshot of response scores in the at-a-glance summaries." lightbox= "../media/evaluations/prompts/rate-results.png":::
60
+
:::image type="content" source="../media/evaluations/prompts/rate-results.png" alt-text="Screenshot that shows response scores in the at-a-glance summaries." lightbox= "../media/evaluations/prompts/rate-results.png":::
65
61
66
-
## Iterate on your prompt and reevaluate
62
+
## Iterate on your prompt and reevaluate
67
63
68
-
Based on your summary, you might want to make changes to your prompt. You can use the prompt controls above to edit your prompt setup. This can be updating the system message, changing the model, or editing the parameters.
64
+
Based on your summary, you might want to make changes to your prompt. You can edit your prompt setup by using the prompt controls mentioned previously. You can update the system message, change the model, edit the parameters, and more.
69
65
70
-
After making your edits, you can choose to rerun all to update the entire table or focus on rerunning specific rows that didn't meet your expectations the first time.
66
+
After you make your edits, you can run them all again to update the entire table or run only specific rows again that didn't meet your expectations the first time.
71
67
72
-
## Save and compare results
68
+
## Save and compare results
73
69
74
-
After populating your results, you can **Save results** to share progress with your team or to continue your manual evaluation from where you left off later.
70
+
After you populate your results, you can select **Save results**. By saving your results, you can share the progress with your team or continue your manual evaluation later.
75
71
76
-
:::image type="content" source="../media/evaluations/prompts/save-and-compare-results.png" alt-text="Screenshot of the save results." lightbox= "../media/evaluations/prompts/save-and-compare-results.png":::
72
+
:::image type="content" source="../media/evaluations/prompts/save-and-compare-results.png" alt-text="Screenshot of the Save results selection." lightbox= "../media/evaluations/prompts/save-and-compare-results.png":::
77
73
78
-
You can also compare the thumbs up and down ratings across your different manual evaluations by saving them and viewing them in the Evaluation tab under Manual evaluation.
74
+
You can also compare the thumbs up and down ratings across your manual evaluations. Save them, and then view them on the **Evaluation** tab under **Manual evaluation**.
79
75
80
-
## Next steps
76
+
## Related content
81
77
82
78
Learn more about how to evaluate your generative AI applications:
83
-
-[Evaluate your generative AI apps with the Azure AI Foundry portal or SDK](./evaluate-generative-ai-app.md)
84
-
-[View the evaluation results](./evaluate-results.md)
79
+
80
+
*[Evaluate your generative AI apps with the Azure AI Foundry portal or SDK](./evaluate-generative-ai-app.md)
81
+
*[View the evaluation results](./evaluate-results.md)
85
82
86
83
Learn more about [harm mitigation techniques](../concepts/evaluation-approach-gen-ai.md).
0 commit comments