Skip to content

Commit a4e600f

Browse files
Acrolinx suggestions
1 parent a34a818 commit a4e600f

File tree

1 file changed

+7
-9
lines changed

1 file changed

+7
-9
lines changed

articles/ai-services/openai/how-to/evaluations.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Azure OpenAI evaluation enables developers to create evaluation runs to test aga
5050
- West US 2
5151
- West US 3
5252

53-
If your preferred region is missing, please refer to [Azure OpenAI regions](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
53+
If your preferred region is missing, refer to [Azure OpenAI regions](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#global-standard-model-availability) and check if it is one of the Azure OpenAI regional availability zones.
5454

5555
### Supported deployment types
5656

@@ -63,7 +63,7 @@ If your preferred region is missing, please refer to [Azure OpenAI regions](http
6363

6464
## Evaluation API (preview)
6565

66-
Evaluation API lets users test and improve model outputs directly through API calls, making the experience simple and customizable for developers to programmatically assess model quality and performance in their development workflows. To use Evaluation API, check out the [REST API documentation](https://learn.microsoft.com/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
66+
Evaluation API lets you test model outputs directly through API calls, and programmatically assess model quality and performance. To use Evaluation API, check out the [REST API documentation](https://learn.microsoft.com/azure/ai-services/openai/authoring-reference-preview#evaluation---get-list).
6767

6868
## Evaluation pipeline
6969

@@ -119,11 +119,9 @@ Outputs generated during the evaluation will be referenced in subsequent steps u
119119
120120
### Model deployment
121121

122-
As part of creating evaluations you'll pick which models to use when generating responses (optional) as well as which models to use when grading models with specific testing criteria.
122+
In Azure OpenAI, you need to create a model deployment to use for your evaluation. You can pick and deploy a single model, or multiple models, depending on your needs. These model deployments will be used when grading your base model or your fine-tuned model with the test criteria of your choice. You can also use the deployed models to auto-generate responses for your provided prompt.
123123

124-
In Azure OpenAI you'll be assigning specific model deployments to use as part of your evaluations. You can compare multiple model deployments in single evaluation run.
125-
126-
You can evaluate either base or fine-tuned model deployments. The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
124+
The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
127125

128126
### Testing criteria
129127

@@ -177,11 +175,11 @@ You will select the model of your choice. If you do not have a model, you can cr
177175

178176
:::image type="content" source="../media/how-to/evaluations/eval-generate-2.png" alt-text="Screenshot of the UX for generating model responses" lightbox="../media/how-to/evaluations/eval-generate-2.png":::
179177

180-
6. For creating a test criteria, select **Add**. For the example file we provided above, we are going to be assessing semantic similarity. Select **Model Scorer**, which contains test criteria presets for Semantic Similarity.
178+
6. For creating a test criteria, select **Add**. For the example file we provided, we are going to be assessing semantic similarity. Select **Model Scorer**, which contains test criteria presets for Semantic Similarity.
181179

182180
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-1.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-1.png":::
183181

184-
Select **Semantic Similarity** at the top. Scroll to the bottom, and in `User` section, specify `{{item.output}}` as `Ground truth`, and specify `{{sample.output_text}}` as `Output`. This will take the original reference output from your evaluation `.jsonl` file (the sample file above) and compare it against the output that is generated by the model you chose in the previous step.
182+
Select **Semantic Similarity** at the top. Scroll to the bottom, and in `User` section, specify `{{item.output}}` as `Ground truth`, and specify `{{sample.output_text}}` as `Output`. This will take the original reference output from your evaluation `.jsonl` file (the sample file provided) and compare it against the output that is generated by the model you chose in the previous step.
185183

186184
:::image type="content" source="../media/how-to/evaluations/eval-semantic-similarity-2.png" alt-text="Screenshot of the semantic similarity UX config." lightbox="../media/how-to/evaluations/eval-semantic-similarity-2.png":::
187185

@@ -206,7 +204,7 @@ You will select the model of your choice. If you do not have a model, you can cr
206204

207205
## Types of Testing Criteria
208206

209-
Azure OpenAI Evaluation offers various evaluation testing criteria on top of Semantic Similarity we saw in the example above. This section provides information about each testing criteria at much more detail.
207+
Azure OpenAI Evaluation offers various evaluation testing criteria on top of Semantic Similarity we saw in the provided example. This section provides information about each testing criteria at much more detail.
210208

211209
### Factuality
212210

0 commit comments

Comments
 (0)