You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -35,16 +35,16 @@ In this article, you learn to:
35
35
## Prerequisites
36
36
37
37
* A test dataset in one of these formats: CSV or JSON Lines (JSONL). If you don't have a dataset available, you can also manually enter data from the UI.
38
-
* A deployment of one of these models: GPT3.5, GPT4, or Davinci. To learn more about how to create a deployment, see [Deploy models](./deploy-models-openai.md).
38
+
* A deployment of one of these models: GPT-3.5, GPT-4, or Davinci. To learn more about how to create a deployment, see [Deploy models](./deploy-models-openai.md).
39
39
40
40
> [!NOTE]
41
41
> At this time, manual evaluation is only supported for Azure OpenAI models for chat and completion task types.
42
42
43
43
## Generate your manual evaluation results
44
44
45
-
From the **Playground**, select the **Manual evaluation** option to begin the process of manually reviewing the model responses based on your test data and prompt. Your prompt is automatically transitioned to your **Manual evaluation** file. You need to add test data to evaluate the prompt against. You can do this step manually by using the text boxes in the **Input** column.
45
+
From **Playground**, select the **Manual evaluation** option to begin the process of manually reviewing the model responses based on your test data and prompt. Your prompt is automatically transitioned to your **Manual evaluation** file. You need to add test data to evaluate the prompt against. You can do this step manually by using the text boxes in the **Input** column.
46
46
47
-
You can also use the **Import Data** feature to select one of the existing datasets in your project, or upload a dataset in CSV or JSONL format. After loading your data, you'll be prompted to map the columns appropriately. After you finish and select **Import**, the data is populated in the appropriate columns.
47
+
You can also use the **Import Data** feature to select one of the existing datasets in your project, or upload a dataset in CSV or JSONL format. After you load your data, you're prompted to map the columns appropriately. After you finish and select **Import**, the data is populated in the appropriate columns.
48
48
49
49
:::image type="content" source="../media/evaluations/prompts/generate-manual-evaluation-results.png" alt-text="Screenshot that shows how to generate manual evaluation results." lightbox= "../media/evaluations/prompts/generate-manual-evaluation-results.png":::
50
50
@@ -55,23 +55,23 @@ Now that your data is added, you can select **Run** to populate the output colum
55
55
56
56
## Rate your model's responses
57
57
58
-
You can rate the prompt's output by selecting a thumbs up or down for each response. Based on the ratings you provide, you can view these response scores in the at-a-glance summaries.
58
+
You can rate the prompt's output by selecting a thumbs up or down for each response. Based on the ratings that you provide, you can view these response scores in the at-a-glance summaries.
59
59
60
60
:::image type="content" source="../media/evaluations/prompts/rate-results.png" alt-text="Screenshot that shows response scores in the at-a-glance summaries." lightbox= "../media/evaluations/prompts/rate-results.png":::
61
61
62
62
## Iterate on your prompt and reevaluate
63
63
64
64
Based on your summary, you might want to make changes to your prompt. You can edit your prompt setup by using the prompt controls mentioned previously. You can update the system message, change the model, edit the parameters, and more.
65
65
66
-
After making your edits, you can run them all again to update the entire table or run only specific rows again that didn't meet your expectations the first time.
66
+
After you make your edits, you can run them all again to update the entire table or run only specific rows again that didn't meet your expectations the first time.
67
67
68
68
## Save and compare results
69
69
70
-
After populating your results, you can select **Save results**. By saving your results, you can share the progress with your team or continue your manual evaluation later.
70
+
After you populate your results, you can select **Save results**. By saving your results, you can share the progress with your team or continue your manual evaluation later.
71
71
72
72
:::image type="content" source="../media/evaluations/prompts/save-and-compare-results.png" alt-text="Screenshot of the Save results selection." lightbox= "../media/evaluations/prompts/save-and-compare-results.png":::
73
73
74
-
You can also compare the thumbs up and down ratings across your manual evaluations. Save them, and then view them in the **Evaluation** tab under **Manual evaluation**.
74
+
You can also compare the thumbs up and down ratings across your manual evaluations. Save them, and then view them on the **Evaluation** tab under **Manual evaluation**.
0 commit comments