You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/evaluate-prompts-playground.md
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,9 +34,12 @@ In this article you learn to:
34
34
35
35
To generate manual evaluation results, you need to have the following ready:
36
36
37
-
* A test dataset in one of these formats: csv or jsonl. If you don't have a dataset available, we also allow you to input data manually from the UI.
37
+
* A test dataset in one of these formats: csv or jsonl. If you don't have a dataset available, we also allow you to input data manually from the UI.
38
38
39
-
* A deployment of one of these models: GPT 3.5 models, GPT 4 models, or Davinci models. Learn more about how to create a deployment [here](./deploy-models-openai.md).
39
+
* A deployment of one of these models: GPT 3.5 models, GPT 4 models, or Davinci models. To learn more about how to create a deployment, see [Deploy models](./deploy-models-openai.md).
40
+
41
+
> [!NOTE]
42
+
> Manual evaluation is only supported for Azure OpenAI models at this time for chat and completion task types.
40
43
41
44
## Generate your manual evaluation results
42
45
@@ -46,8 +49,7 @@ This can be done manually using the text boxes in the **Input** column.
46
49
47
50
You can also **Import Data** to choose one of your previous existing datasets in your project or upload a dataset that is in CSV or JSONL format. After loading your data, you'll be prompted to map the columns appropriately. Once you finish and select **Import**, the data is populated appropriately in the columns below.
> You can add as many as 50 input rows to your manual evaluation. If your test data has more than 50 input rows, we will upload the first 50 in the input column.
@@ -58,7 +60,7 @@ Now that your data is added, you can **Run** to populate the output column with
58
60
59
61
You can provide a thumb up or down rating to each response to assess the prompt output. Based on the ratings you provided, you can view these response scores in the at-a-glance summaries.
60
62
61
-
:::image type="content" source="../media/evaluations/prompts/rate-results.gif" alt-text="GIF of response scores in the at-a-glance summaries." lightbox= "../media/evaluations/prompts/rate-results.gif":::
63
+
:::image type="content" source="../media/evaluations/prompts/rate-results.png" alt-text="Screenshot of response scores in the at-a-glance summaries." lightbox= "../media/evaluations/prompts/rate-results.png":::
62
64
63
65
## Iterate on your prompt and reevaluate
64
66
@@ -70,7 +72,7 @@ After making your edits, you can choose to rerun all to update the entire table
70
72
71
73
After populating your results, you can **Save results** to share progress with your team or to continue your manual evaluation from where you left off later.
72
74
73
-
:::image type="content" source="../media/evaluations/prompts/save-and-compare-results.gif" alt-text="GIF of the save results workflow." lightbox= "../media/evaluations/prompts/save-and-compare-results.gif":::
75
+
:::image type="content" source="../media/evaluations/prompts/save-and-compare-results.png" alt-text="Screenshot of the save results." lightbox= "../media/evaluations/prompts/save-and-compare-results.png":::
74
76
75
77
You can also compare the thumbs up and down ratings across your different manual evaluations by saving them and viewing them in the Evaluation tab under Manual evaluation.
0 commit comments