You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/evaluations.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,7 @@ description: Learn how to use evaluations with Azure OpenAI
5
5
manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.topic: how-to
8
+
ms.custom: references_regions
8
9
ms.date: 11/10/2024
9
10
author: mrbullwinkle
10
11
ms.author: mbullwin
@@ -124,13 +125,13 @@ Testing criteria is used to assess the effectiveness of each output generated by
124
125
125
126
You'll see the first three lines of the file as a preview:
126
127
127
-
:::image type="content" source="../media/how-to/evaluations/preview.png" alt-text="Screenshot that shows a preview of an uploaded evaluation file" lightbox="../media/how-to/evaluations/preview.png":::
128
+
:::image type="content" source="../media/how-to/evaluations/preview.png" alt-text="Screenshot that shows a preview of an uploaded evaluation file." lightbox="../media/how-to/evaluations/preview.png":::
128
129
129
130
5. Select the toggle for **Generate responses**. Select `{{item.input}}` from the dropdown. This will inject the input fields from our evaluation file into individual prompts for a new model run that we want to able to compare against our evaluation dataset. The model will take that input and generate its own unique outputs which in this case will be stored in a variable called `{{sample.output_text}}`. We'll then use that sample output text later as part of our testing criteria. Alternatively you could provide your own custom system message and individual message examples manually.
130
131
131
132
6. Select which model you want to generate responses based on your evaluation. If you don't have a model you can create one. For the purpose of this example we're using a standard deployment of `gpt-4o-mini`.
132
133
133
-
:::image type="content" source="../media/how-to/evaluations/item-input.png" alt-text="Screenshot of the generate responses UX with a model selected." lightbox="../media/how-to/evaluations/item-input.png":::
134
+
:::image type="content" source="../media/how-to/evaluations/item-input.png" alt-text="Screenshot of the UX for generating model responses with a model selected." lightbox="../media/how-to/evaluations/item-input.png":::
134
135
135
136
The settings/sprocket symbol controls the basic parameters that are passed to the model. Only the following parameters are supported at this time:
136
137
@@ -144,7 +145,7 @@ Testing criteria is used to assess the effectiveness of each output generated by
144
145
145
146
8. Select **Semantic Similarity** > Under **Compare** add `{{item.output}}` under **With** add ``{{sample.output_text}}``. This will take the original reference output from your evaluation `.jsonl` file and compare it against the output that will be generated by giving the model prompts based on your ``{{item.input}}``.
146
147
147
-
:::image type="content" source="../media/how-to/evaluations/semantic-similarity-config.png" alt-text="Screenshot of the semantic similarity UX config" lightbox="../media/how-to/evaluations/semantic-similarity-config.png":::
148
+
:::image type="content" source="../media/how-to/evaluations/semantic-similarity-config.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/semantic-similarity-config.png":::
148
149
149
150
9. Select **Add** > at this point you can either add additional testing criteria or you select Create to initiate the evaluation job run.
150
151
@@ -156,7 +157,7 @@ Testing criteria is used to assess the effectiveness of each output generated by
156
157
157
158
:::image type="content" source="../media/how-to/evaluations/test-complete.png" alt-text="Screenshot of a completed semantic similarity test with mix of pass and failures." lightbox="../media/how-to/evaluations/test-complete.png":::
158
159
159
-
12. For semantic similarity **View output details** contains a JSON representation that you can copy/paste of the your passing tests.
160
+
12. For semantic similarity **View output details** contains a JSON representation that you can copy/paste of your passing tests.
160
161
161
162
:::image type="content" source="../media/how-to/evaluations/output-details.png" alt-text="Screenshot of the evaluation status UX with output details." lightbox="../media/how-to/evaluations/output-details.png":::
162
163
@@ -253,13 +254,13 @@ Verifies if the output is valid JSON or XML.
253
254
254
255
Ensures the output follows the specified structure.
255
256
256
-
:::image type="content" source="../media/how-to/evaluations/matches-schema.png" alt-text="Screenshot of the matches schema testing criteria" lightbox="../media/how-to/evaluations/matches-schema.png":::
257
+
:::image type="content" source="../media/how-to/evaluations/matches-schema.png" alt-text="Screenshot of the matches schema testing criteria." lightbox="../media/how-to/evaluations/matches-schema.png":::
257
258
258
259
### Criteria match
259
260
260
261
Assess if model's response matches your criteria. Grade: Pass or Fail.
261
262
262
-
:::image type="content" source="../media/how-to/evaluations/criteria-match.png" alt-text="Screenshot of the matches criteria test" lightbox="../media/how-to/evaluations/criteria-match.png":::
263
+
:::image type="content" source="../media/how-to/evaluations/criteria-match.png" alt-text="Screenshot of the matches criteria test." lightbox="../media/how-to/evaluations/criteria-match.png":::
263
264
264
265
You can view the prompt text that is used as part of this testing criteria by selecting the dropdown next to the prompt. The current prompt text is:
0 commit comments