Skip to content

Commit 7fcf400

Browse files
authored
Merge pull request #2630 from mrbullwinkle/mrb_01_28_2025_eval_text
[Azure OpenAI] eval updates
2 parents 2fb87d6 + c352e1b commit 7fcf400

File tree

5 files changed

+6
-6
lines changed

5 files changed

+6
-6
lines changed

articles/ai-services/openai/how-to/evaluations.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -77,13 +77,13 @@ When you upload and select you evaluation file a preview of the first three line
7777

7878
You can choose any existing previously uploaded datasets, or upload a new dataset.
7979

80-
### Generate responses (optional)
80+
### Create responses (optional)
8181

8282
The prompt you use in your evaluation should match the prompt you plan to use in production. These prompts provide the instructions for the model to follow. Similar to the playground experiences, you can create multiple inputs to include few-shot examples in your prompt. For more information, see [prompt engineering techniques](../concepts/advanced-prompt-engineering.md) for details on some advanced techniques in prompt design and prompt engineering.
8383

8484
You can reference your input data within the prompts by using the `{{input.column_name}}` format, where column_name corresponds to the names of the columns in your input file.
8585

86-
Outputs generated during the evaluation will be referenced in subsequent steps using the `{{sample.output_text}}` format.
86+
Outputs generated during the evaluation will be referenced in subsequent steps using the `{{sample.output_text}}` format.
8787

8888
> [!NOTE]
8989
> You need to use double curly braces to make sure you reference to your data correctly.
@@ -92,9 +92,9 @@ Outputs generated during the evaluation will be referenced in subsequent steps u
9292

9393
As part of creating evaluations you'll pick which models to use when generating responses (optional) as well as which models to use when grading models with specific testing criteria.
9494

95-
In Azure OpenAI you'll be assigning specific model deployments to use as part of your evaluations. You can compare multiple deployments by creating a separate evaluation configuration for each model. This enables you to define specific prompts for each evaluation, providing better control over the variations required by different models.
95+
In Azure OpenAI you'll be assigning specific model deployments to use as part of your evaluations. You can compare multiple model deployments in single evaluation run.
9696

97-
You can evaluate either a base or a fine-tuned model deployment. The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
97+
You can evaluate either base or fine-tuned model deployments. The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
9898

9999
### Testing criteria
100100

@@ -109,7 +109,7 @@ Testing criteria is used to assess the effectiveness of each output generated by
109109

110110
:::image type="content" source="../media/how-to/evaluations/new-evaluation.png" alt-text="Screenshot of the Azure OpenAI evaluation UX with new evaluation selected." lightbox="../media/how-to/evaluations/new-evaluation.png":::
111111

112-
3. Enter a name of your evaluation. By default a random name is automatically generated unless you edit and replace it. > select **Upload new dataset**.
112+
3. Enter a name of your evaluation. By default a random name is automatically generated unless you edit and replace it. Select **Upload new dataset**.
113113

114114
:::image type="content" source="../media/how-to/evaluations/upload.png" alt-text="Screenshot of the Azure OpenAI upload UX." lightbox="../media/how-to/evaluations/upload.png":::
115115

@@ -132,7 +132,7 @@ Testing criteria is used to assess the effectiveness of each output generated by
132132

133133
:::image type="content" source="../media/how-to/evaluations/preview.png" alt-text="Screenshot that shows a preview of an uploaded evaluation file." lightbox="../media/how-to/evaluations/preview.png":::
134134

135-
5. Select the toggle for **Generate responses**. Select `{{item.input}}` from the dropdown. This will inject the input fields from our evaluation file into individual prompts for a new model run that we want to able to compare against our evaluation dataset. The model will take that input and generate its own unique outputs which in this case will be stored in a variable called `{{sample.output_text}}`. We'll then use that sample output text later as part of our testing criteria. Alternatively you could provide your own custom system message and individual message examples manually.
135+
5. Under **Responses** select the **Create** button. Select `{{item.input}}` from the **Create with a template** dropdown. This will inject the input fields from our evaluation file into individual prompts for a new model run that we want to able to compare against our evaluation dataset. The model will take that input and generate its own unique outputs which in this case will be stored in a variable called `{{sample.output_text}}`. We'll then use that sample output text later as part of our testing criteria. Alternatively you could provide your own custom system message and individual message examples manually.
136136

137137
6. Select which model you want to generate responses based on your evaluation. If you don't have a model you can create one. For the purpose of this example we're using a standard deployment of `gpt-4o-mini`.
138138

37.6 KB
Loading
232 KB
Loading
107 KB
Loading
29.1 KB
Loading

0 commit comments

Comments
 (0)