MicrosoftDocs
diff --git a/‎articles/ai-services/openai/how-to/evaluations.md
Lines changed: 6 additions & 6 deletions b/‎articles/ai-services/openai/how-to/evaluations.md
Lines changed: 6 additions & 6 deletions
diff --git a/‎articles/ai-services/openai/media/how-to/evaluations/item-input.png
37.6 KB b/‎articles/ai-services/openai/media/how-to/evaluations/item-input.png
37.6 KB
diff --git a/‎articles/ai-services/openai/media/how-to/evaluations/new-evaluation.png
232 KB b/‎articles/ai-services/openai/media/how-to/evaluations/new-evaluation.png
232 KB
diff --git a/‎articles/ai-services/openai/media/how-to/evaluations/preview.png
107 KB b/‎articles/ai-services/openai/media/how-to/evaluations/preview.png
107 KB
diff --git a/‎articles/ai-services/openai/media/how-to/evaluations/upload.png
29.1 KB b/‎articles/ai-services/openai/media/how-to/evaluations/upload.png
29.1 KB
@@ -77,13 +77,13 @@ When you upload and select you evaluation file a preview of the first three line
 
 You can choose any existing previously uploaded datasets, or upload a new dataset.
 
-### Generate responses (optional)
+### Create responses (optional)
 
 The prompt you use in your evaluation should match the prompt you plan to use in production. These prompts provide the instructions for the model to follow. Similar to the playground experiences, you can create multiple inputs to include few-shot examples in your prompt. For more information, see [prompt engineering techniques](../concepts/advanced-prompt-engineering.md) for details on some advanced techniques in prompt design and prompt engineering.
 
 You can reference your input data within the prompts by using the `{{input.column_name}}` format, where column_name corresponds to the names of the columns in your input file.
 
-Outputs generated during the evaluation will be referenced in subsequent steps using the `{{sample.output_text}}` format. 
+Outputs generated during the evaluation will be referenced in subsequent steps using the `{{sample.output_text}}` format.
 
 > [!NOTE]
 > You need to use double curly braces to make sure you reference to your data correctly.
@@ -92,9 +92,9 @@ Outputs generated during the evaluation will be referenced in subsequent steps u
 
 As part of creating evaluations you'll pick which models to use when generating responses (optional) as well as which models to use when grading models with specific testing criteria.  
 
-In Azure OpenAI you'll be assigning specific model deployments to use as part of your evaluations. You can compare multiple deployments by creating a separate evaluation configuration for each model. This enables you to define specific prompts for each evaluation, providing better control over the variations required by different models.
+In Azure OpenAI you'll be assigning specific model deployments to use as part of your evaluations. You can compare multiple model deployments in single evaluation run.
 
-You can evaluate either a base or a fine-tuned model deployment. The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
+You can evaluate either base or fine-tuned model deployments. The deployments available in your list depend on those you created within your Azure OpenAI resource. If you can't find the desired deployment, you can create a new one from the Azure OpenAI Evaluation page.
 
 ### Testing criteria
 
@@ -109,7 +109,7 @@ Testing criteria is used to assess the effectiveness of each output generated by
 
     :::image type="content" source="../media/how-to/evaluations/new-evaluation.png" alt-text="Screenshot of the Azure OpenAI evaluation UX with new evaluation selected." lightbox="../media/how-to/evaluations/new-evaluation.png":::
 
-3. Enter a name of your evaluation. By default a random name is automatically generated unless you edit and replace it. > select **Upload new dataset**.
+3. Enter a name of your evaluation. By default a random name is automatically generated unless you edit and replace it. Select **Upload new dataset**.
 
     :::image type="content" source="../media/how-to/evaluations/upload.png" alt-text="Screenshot of the Azure OpenAI upload UX." lightbox="../media/how-to/evaluations/upload.png":::
 
@@ -132,7 +132,7 @@ Testing criteria is used to assess the effectiveness of each output generated by
 
     :::image type="content" source="../media/how-to/evaluations/preview.png" alt-text="Screenshot that shows a preview of an uploaded evaluation file." lightbox="../media/how-to/evaluations/preview.png":::
 
-5. Select the toggle for **Generate responses**. Select `{{item.input}}` from the dropdown. This will inject the input fields from our evaluation file into individual prompts for a new model run that we want to able to compare against our evaluation dataset. The model will take that input and generate its own unique outputs which in this case will be stored in a variable called `{{sample.output_text}}`. We'll then use that sample output text later as part of our testing criteria. Alternatively you could provide your own custom system message and individual message examples manually.
+5. Under **Responses** select the **Create** button. Select `{{item.input}}` from the **Create with a template** dropdown. This will inject the input fields from our evaluation file into individual prompts for a new model run that we want to able to compare against our evaluation dataset. The model will take that input and generate its own unique outputs which in this case will be stored in a variable called `{{sample.output_text}}`. We'll then use that sample output text later as part of our testing criteria. Alternatively you could provide your own custom system message and individual message examples manually.
 
 6. Select which model you want to generate responses based on your evaluation. If you don't have a model you can create one. For the purpose of this example we're using a standard deployment of `gpt-4o-mini`.