Skip to content

Commit 252fc7a

Browse files
Merge pull request #6025 from JustPies/jpvalidation-7-14
Bulk - fix validation issues
2 parents c60e122 + 8d888da commit 252fc7a

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

articles/ai-foundry/openai/how-to/evaluations.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Azure OpenAI evaluation enables developers to create evaluation runs to test aga
5050
- West US 2
5151
- West US 3
5252

53-
If your preferred region is missing, refer to [Azure OpenAI regions](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
53+
If your preferred region is missing, refer to [Azure OpenAI regions](/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
5454

5555
### Supported deployment types
5656

@@ -63,7 +63,7 @@ If your preferred region is missing, refer to [Azure OpenAI regions](https://lea
6363

6464
## Evaluation API (preview)
6565

66-
Evaluation API lets you test model outputs directly through API calls, and programmatically assess model quality and performance. To use Evaluation API, check out the [REST API documentation](https://learn.microsoft.com/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
66+
Evaluation API lets you test model outputs directly through API calls, and programmatically assess model quality and performance. To use Evaluation API, check out the [REST API documentation](/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
6767

6868
## Evaluation pipeline
6969

@@ -127,7 +127,7 @@ The deployments available in your list depend on those you created within your A
127127

128128
Testing criteria is used to assess the effectiveness of each output generated by the target model. These tests compare the input data with the output data to ensure consistency. You have the flexibility to configure different criteria to test and measure the quality and relevance of the output at different levels.
129129

130-
:::image type="content" source="../media/how-to/evaluations/eval-testing-criteria.png" alt-text="Screenshot that shows the evaluations testing criteria options." lightbox="../media/how-to/evaluations/eval-testing-criteria.png":::
130+
:::image type="content" source="../media/how-to/evaluations/eval-testing-criteria.png" alt-text="Screenshot that shows the different testing criteria selections." lightbox="../media/how-to/evaluations/eval-testing-criteria.png":::
131131

132132
When you click into each testing criteria, you will see different types of graders as well as preset schemas that you can modify per your own evaluation dataset and criteria.
133133

@@ -146,11 +146,11 @@ When you click into each testing criteria, you will see different types of grade
146146

147147
4. Select your evaluation data which will be in `.jsonl` format. If you already have an existing data, you can select one, or upload a new data.
148148

149-
:::image type="content" source="../media/how-to/evaluations/upload-data-1.png" alt-text="Screenshot of data upload." lightbox="../media/how-to/evaluations/upload-data-1.png":::
149+
:::image type="content" source="../media/how-to/evaluations/upload-data-1.png" alt-text="Screenshot of data upload options." lightbox="../media/how-to/evaluations/upload-data-1.png":::
150150

151151
When you upload new data, you'll see the first three lines of the file as a preview on the right side:
152152

153-
:::image type="content" source="../media/how-to/evaluations/upload-data-2.png" alt-text="Screenshot of data upload." lightbox="../media/how-to/evaluations/upload-data-2.png":::
153+
:::image type="content" source="../media/how-to/evaluations/upload-data-2.png" alt-text="Screenshot of data upload with example selection." lightbox="../media/how-to/evaluations/upload-data-2.png":::
154154

155155
If you need a sample test file, you can use this sample `.jsonl` text. This sample contains sentences of various technical content, and we are going to be assessing semantic similarity across these sentences.
156156

@@ -169,19 +169,19 @@ When you click into each testing criteria, you will see different types of grade
169169

170170
5. If you would like to create new responses using inputs from your test data, you can select 'Generate new responses'. This will inject the input fields from our evaluation file into individual prompts for a model of your choice to generate output.
171171

172-
:::image type="content" source="../media/how-to/evaluations/eval-generate-1.png" alt-text="Screenshot of the UX for generating model responses." lightbox="../media/how-to/evaluations/eval-generate-1.png":::
172+
:::image type="content" source="../media/how-to/evaluations/eval-generate-1.png" alt-text="Screenshot of the UX showing selected import test data." lightbox="../media/how-to/evaluations/eval-generate-1.png":::
173173

174174
You will select the model of your choice. If you do not have a model, you can create a new model deployment. The selected model will take the input data and generate its own unique outputs, which in this case will be stored in a variable called `{{sample.output_text}}`. We'll then use that output later as part of our testing criteria. Alternatively, you could provide your own custom system message and individual message examples manually.
175175

176176
:::image type="content" source="../media/how-to/evaluations/eval-generate-2.png" alt-text="Screenshot of the UX for generating model responses." lightbox="../media/how-to/evaluations/eval-generate-2.png":::
177177

178178
6. For creating a test criteria, select **Add**. For the example file we provided, we are going to be assessing semantic similarity. Select **Model Scorer**, which contains test criteria presets for Semantic Similarity.
179179

180-
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-1.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-1.png":::
180+
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-1.png" alt-text="Screenshot of the semantic similarity UX config highlighting Model scorer." lightbox="../media/how-to/evaluations/eval-semantic-similarity-1.png":::
181181

182182
Select **Semantic Similarity** at the top. Scroll to the bottom, and in `User` section, specify `{{item.output}}` as `Ground truth`, and specify `{{sample.output_text}}` as `Output`. This will take the original reference output from your evaluation `.jsonl` file (the sample file provided) and compare it against the output that is generated by the model you chose in the previous step.
183183

184-
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-2.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-2.png":::
184+
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-2.png" alt-text="Screenshot of the semantic similarity UX config with generated output." lightbox="../media/how-to/evaluations/eval-semantic-similarity-2.png":::
185185

186186
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-3.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-3.png":::
187187

@@ -190,7 +190,7 @@ You will select the model of your choice. If you do not have a model, you can cr
190190
8. You are ready to create your Evaluation. Provide your Evaluation name, review everything looks correct, and **Submit** to create the Evaluation job. You'll be taken to a status page for your evaluation job, which will show the status as "Waiting".
191191

192192
:::image type="content" source="../media/how-to/evaluations/eval-submit-job.png" alt-text="Screenshot of the evaluation job submit UX." lightbox="../media/how-to/evaluations/eval-submit-job.png":::
193-
:::image type="content" source="../media/how-to/evaluations/eval-submit-job-2.png" alt-text="Screenshot of the evaluation job submit UX." lightbox="../media/how-to/evaluations/eval-submit-job-2.png":::
193+
:::image type="content" source="../media/how-to/evaluations/eval-submit-job-2.png" alt-text="Screenshot of the evaluation job submit UX, with a status of waiting." lightbox="../media/how-to/evaluations/eval-submit-job-2.png":::
194194

195195
9. Once your evaluation job has created, you can select the job to view the full details of the job:
196196

0 commit comments

Comments
 (0)