You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/evaluations.md
+7-9Lines changed: 7 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ Azure OpenAI evaluation enables developers to create evaluation runs to test aga
50
50
- West US 2
51
51
- West US 3
52
52
53
-
If your preferred region is missing, please refer to [Azure OpenAI regions](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
53
+
If your preferred region is missing, refer to [Azure OpenAI regions](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
54
54
55
55
### Supported deployment types
56
56
@@ -63,7 +63,7 @@ If your preferred region is missing, please refer to [Azure OpenAI regions](http
63
63
64
64
## Evaluation API (preview)
65
65
66
-
Evaluation API lets users test and improve model outputs directly through API calls, making the experience simple and customizable for developers to programmatically assess model quality and performance in their development workflows. To use Evaluation API, check out the [REST API documentation](https://learn.microsoft.com/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
66
+
Evaluation API lets you test model outputs directly through API calls, and programmatically assess model quality and performance. To use Evaluation API, check out the [REST API documentation](https://learn.microsoft.com/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
67
67
68
68
## Evaluation pipeline
69
69
@@ -119,11 +119,9 @@ Outputs generated during the evaluation will be referenced in subsequent steps u
119
119
120
120
### Model deployment
121
121
122
-
As part of creating evaluations you'll pick which models to use when generating responses (optional) as well as which models to use when grading models with specific testing criteria.
122
+
In Azure OpenAI, you need to create a model deployment to use for your evaluation. You can pick and deploy a single model, or multiple models, depending on your needs. These model deployments will be used when grading your base model or your fine-tuned model with the test criteria of your choice. You can also use the deployed models to auto-generate responses for your provided prompt.
123
123
124
-
In Azure OpenAI you'll be assigning specific model deployments to use as part of your evaluations. You can compare multiple model deployments in single evaluation run.
125
-
126
-
You can evaluate either base or fine-tuned model deployments. The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
124
+
The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
127
125
128
126
### Testing criteria
129
127
@@ -177,11 +175,11 @@ You will select the model of your choice. If you do not have a model, you can cr
177
175
178
176
:::image type="content" source="../media/how-to/evaluations/eval-generate-2.png" alt-text="Screenshot of the UX for generating model responses" lightbox="../media/how-to/evaluations/eval-generate-2.png":::
179
177
180
-
6. For creating a test criteria, select **Add**. For the example file we provided above, we are going to be assessing semantic similarity. Select **Model Scorer**, which contains test criteria presets for Semantic Similarity.
178
+
6. For creating a test criteria, select **Add**. For the example file we provided, we are going to be assessing semantic similarity. Select **Model Scorer**, which contains test criteria presets for Semantic Similarity.
181
179
182
180
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-1.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-1.png":::
183
181
184
-
Select **Semantic Similarity** at the top. Scroll to the bottom, and in `User` section, specify `{{item.output}}` as `Ground truth`, and specify `{{sample.output_text}}` as `Output`. This will take the original reference output from your evaluation `.jsonl` file (the sample file above) and compare it against the output that is generated by the model you chose in the previous step.
182
+
Select **Semantic Similarity** at the top. Scroll to the bottom, and in `User` section, specify `{{item.output}}` as `Ground truth`, and specify `{{sample.output_text}}` as `Output`. This will take the original reference output from your evaluation `.jsonl` file (the sample file provided) and compare it against the output that is generated by the model you chose in the previous step.
185
183
186
184
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-2.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-2.png":::
187
185
@@ -206,7 +204,7 @@ You will select the model of your choice. If you do not have a model, you can cr
206
204
207
205
## Types of Testing Criteria
208
206
209
-
Azure OpenAI Evaluation offers various evaluation testing criteria on top of Semantic Similarity we saw in the example above. This section provides information about each testing criteria at much more detail.
207
+
Azure OpenAI Evaluation offers various evaluation testing criteria on top of Semantic Similarity we saw in the provided example. This section provides information about each testing criteria at much more detail.
0 commit comments