You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/how-to/evaluations.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ Azure OpenAI evaluation enables developers to create evaluation runs to test aga
50
50
- West US 2
51
51
- West US 3
52
52
53
-
If your preferred region is missing, refer to [Azure OpenAI regions](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
53
+
If your preferred region is missing, refer to [Azure OpenAI regions](/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
54
54
55
55
### Supported deployment types
56
56
@@ -63,7 +63,7 @@ If your preferred region is missing, refer to [Azure OpenAI regions](https://lea
63
63
64
64
## Evaluation API (preview)
65
65
66
-
Evaluation API lets you test model outputs directly through API calls, and programmatically assess model quality and performance. To use Evaluation API, check out the [REST API documentation](https://learn.microsoft.com/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
66
+
Evaluation API lets you test model outputs directly through API calls, and programmatically assess model quality and performance. To use Evaluation API, check out the [REST API documentation](/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
67
67
68
68
## Evaluation pipeline
69
69
@@ -127,7 +127,7 @@ The deployments available in your list depend on those you created within your A
127
127
128
128
Testing criteria is used to assess the effectiveness of each output generated by the target model. These tests compare the input data with the output data to ensure consistency. You have the flexibility to configure different criteria to test and measure the quality and relevance of the output at different levels.
129
129
130
-
:::image type="content" source="../media/how-to/evaluations/eval-testing-criteria.png" alt-text="Screenshot that shows the evaluations testing criteria options." lightbox="../media/how-to/evaluations/eval-testing-criteria.png":::
130
+
:::image type="content" source="../media/how-to/evaluations/eval-testing-criteria.png" alt-text="Screenshot that shows the different testing criteria selections." lightbox="../media/how-to/evaluations/eval-testing-criteria.png":::
131
131
132
132
When you click into each testing criteria, you will see different types of graders as well as preset schemas that you can modify per your own evaluation dataset and criteria.
133
133
@@ -146,11 +146,11 @@ When you click into each testing criteria, you will see different types of grade
146
146
147
147
4. Select your evaluation data which will be in `.jsonl` format. If you already have an existing data, you can select one, or upload a new data.
148
148
149
-
:::image type="content" source="../media/how-to/evaluations/upload-data-1.png" alt-text="Screenshot of data upload." lightbox="../media/how-to/evaluations/upload-data-1.png":::
149
+
:::image type="content" source="../media/how-to/evaluations/upload-data-1.png" alt-text="Screenshot of data upload options." lightbox="../media/how-to/evaluations/upload-data-1.png":::
150
150
151
151
When you upload new data, you'll see the first three lines of the file as a preview on the right side:
152
152
153
-
:::image type="content" source="../media/how-to/evaluations/upload-data-2.png" alt-text="Screenshot of data upload." lightbox="../media/how-to/evaluations/upload-data-2.png":::
153
+
:::image type="content" source="../media/how-to/evaluations/upload-data-2.png" alt-text="Screenshot of data upload with example selection." lightbox="../media/how-to/evaluations/upload-data-2.png":::
154
154
155
155
If you need a sample test file, you can use this sample `.jsonl` text. This sample contains sentences of various technical content, and we are going to be assessing semantic similarity across these sentences.
156
156
@@ -169,19 +169,19 @@ When you click into each testing criteria, you will see different types of grade
169
169
170
170
5. If you would like to create new responses using inputs from your test data, you can select 'Generate new responses'. This will inject the input fields from our evaluation file into individual prompts for a model of your choice to generate output.
171
171
172
-
:::image type="content" source="../media/how-to/evaluations/eval-generate-1.png" alt-text="Screenshot of the UX for generating model responses." lightbox="../media/how-to/evaluations/eval-generate-1.png":::
172
+
:::image type="content" source="../media/how-to/evaluations/eval-generate-1.png" alt-text="Screenshot of the UX showing selected import test data." lightbox="../media/how-to/evaluations/eval-generate-1.png":::
173
173
174
174
You will select the model of your choice. If you do not have a model, you can create a new model deployment. The selected model will take the input data and generate its own unique outputs, which in this case will be stored in a variable called `{{sample.output_text}}`. We'll then use that output later as part of our testing criteria. Alternatively, you could provide your own custom system message and individual message examples manually.
175
175
176
176
:::image type="content" source="../media/how-to/evaluations/eval-generate-2.png" alt-text="Screenshot of the UX for generating model responses." lightbox="../media/how-to/evaluations/eval-generate-2.png":::
177
177
178
178
6. For creating a test criteria, select **Add**. For the example file we provided, we are going to be assessing semantic similarity. Select **Model Scorer**, which contains test criteria presets for Semantic Similarity.
179
179
180
-
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-1.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-1.png":::
180
+
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-1.png" alt-text="Screenshot of the semantic similarity UX config highlighting Model scorer." lightbox="../media/how-to/evaluations/eval-semantic-similarity-1.png":::
181
181
182
182
Select **Semantic Similarity** at the top. Scroll to the bottom, and in `User` section, specify `{{item.output}}` as `Ground truth`, and specify `{{sample.output_text}}` as `Output`. This will take the original reference output from your evaluation `.jsonl` file (the sample file provided) and compare it against the output that is generated by the model you chose in the previous step.
183
183
184
-
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-2.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-2.png":::
184
+
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-2.png" alt-text="Screenshot of the semantic similarity UX config with generated output." lightbox="../media/how-to/evaluations/eval-semantic-similarity-2.png":::
185
185
186
186
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-3.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-3.png":::
187
187
@@ -190,7 +190,7 @@ You will select the model of your choice. If you do not have a model, you can cr
190
190
8. You are ready to create your Evaluation. Provide your Evaluation name, review everything looks correct, and **Submit** to create the Evaluation job. You'll be taken to a status page for your evaluation job, which will show the status as "Waiting".
191
191
192
192
:::image type="content" source="../media/how-to/evaluations/eval-submit-job.png" alt-text="Screenshot of the evaluation job submit UX." lightbox="../media/how-to/evaluations/eval-submit-job.png":::
193
-
:::image type="content" source="../media/how-to/evaluations/eval-submit-job-2.png" alt-text="Screenshot of the evaluation job submit UX." lightbox="../media/how-to/evaluations/eval-submit-job-2.png":::
193
+
:::image type="content" source="../media/how-to/evaluations/eval-submit-job-2.png" alt-text="Screenshot of the evaluation job submit UX, with a status of waiting." lightbox="../media/how-to/evaluations/eval-submit-job-2.png":::
194
194
195
195
9. Once your evaluation job has created, you can select the job to view the full details of the job:
0 commit comments