MicrosoftDocs
diff --git a/‎articles/ai-foundry/how-to/benchmark-model-in-catalog.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/how-to/benchmark-model-in-catalog.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/how-to/evaluate-generative-ai-app.md‎
Lines changed: 4 additions & 18 deletions b/‎articles/ai-foundry/how-to/evaluate-generative-ai-app.md‎
Lines changed: 4 additions & 18 deletions
diff --git a/‎articles/ai-foundry/how-to/evaluate-results.md‎
Lines changed: 2 additions & 3 deletions b/‎articles/ai-foundry/how-to/evaluate-results.md‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎articles/ai-foundry/how-to/evaluations-storage-account.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/how-to/evaluations-storage-account.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/media/evaluations/evaluate/select-evaluation-target.png‎
28.4 KB b/‎articles/ai-foundry/media/evaluations/evaluate/select-evaluation-target.png‎
28.4 KB
diff --git a/‎articles/ai-foundry/media/evaluations/view-results/evaluation-criteria-name-match.png‎
-63.3 KB b/‎articles/ai-foundry/media/evaluations/view-results/evaluation-criteria-name-match.png‎
-63.3 KB
diff --git a/‎articles/ai-foundry/media/evaluations/view-results/evaluation-criteria-name-mismatch.png‎
-65.4 KB b/‎articles/ai-foundry/media/evaluations/view-results/evaluation-criteria-name-mismatch.png‎
-65.4 KB
diff --git a/‎articles/ai-foundry/media/evaluations/view-results/evaluation-list-compare.png‎
-579 KB b/‎articles/ai-foundry/media/evaluations/view-results/evaluation-list-compare.png‎
-579 KB
diff --git a/‎articles/ai-foundry/media/evaluations/view-results/learn-more-metrics.png‎
109 KB b/‎articles/ai-foundry/media/evaluations/view-results/learn-more-metrics.png‎
109 KB
@@ -107,7 +107,7 @@ To access benchmark results for a specific metric and dataset:
 The previous sections showed the benchmark results calculated by Microsoft, using public datasets. However, you can try to regenerate the same set of metrics with your data.
 
 1. Return to the **Benchmarks** tab in the model card.
-1. Select **Try with your own data** to [evaluate the model with your data](evaluate-generative-ai-app.md#fine-tuned-model-evaluation). Evaluation on your data helps you see how the model performs in your particular scenarios.
+1. Select **Try with your own data** to [evaluate the model with your data](evaluate-generative-ai-app.md#model-evaluation). Evaluation on your data helps you see how the model performs in your particular scenarios.
 
     :::image type="content" source="../media/how-to/model-benchmarks/try-with-your-own-data.png" alt-text="Screenshot showing the button to select for evaluating with your own data." lightbox="../media/how-to/model-benchmarks/try-with-your-own-data.png":::
 
 
@@ -5,7 +5,7 @@ description: Evaluate your generative AI models and applications by using Azure
 ms.service: azure-ai-foundry
 ms.custom: ignite-2023, references_regions, build-2024, ignite-2024
 ms.topic: how-to
-ms.date: 05/19/2025
+ms.date: 09/22/2025
 ms.reviewer: mithigpe
 ms.author: lagayhar
 author: lgayhardt
@@ -32,8 +32,6 @@ An evaluation run allows you to generate metric outputs for each data row in you
 
 From the collapsible left menu, select **Evaluation** > **Create a new evaluation**.
 
-:::image type="content" source="../media/evaluations/evaluate/create-new-evaluation.png" alt-text="Screenshot of the button to create a new evaluation." lightbox="../media/evaluations/evaluate/create-new-evaluation.png":::
-
 ### From the model catalog page
 
 1. From the collapsible left menu, select **Model catalog**.
@@ -47,11 +45,9 @@ From the collapsible left menu, select **Evaluation** > **Create a new evaluatio
 
 When you start an evaluation from the **Evaluate** page, you first need to choose the evaluation target. By specifying the appropriate evaluation target, we can tailor the evaluation to the specific nature of your application, ensuring accurate and relevant metrics. We support two types of evaluation targets:  
 
-- **Fine-tuned model**: This choice evaluates the output generated by your selected model and user-defined prompt.
+- **Model**: This choice evaluates the output generated by your selected model and user-defined prompt.
 - **Dataset**: Your model-generated outputs are already in a test dataset.
 
-:::image type="content" source="../media/evaluations/evaluate/select-evaluation-target.png" alt-text="Screenshot of the evaluation target selection." lightbox="../media/evaluations/evaluate/select-evaluation-target.png":::
-
 #### Configure test data
 
 When you enter the evaluation creation wizard, you can select from preexisting datasets or upload a new dataset to evaluate. The test dataset needs to have the model-generated outputs to be used for evaluation. A preview of your test data is shown on the right pane.
@@ -72,8 +68,6 @@ We support three types of metrics curated by Microsoft to facilitate a comprehen
 - **AI quality (NLP)**: These natural language processing (NLP) metrics are mathematical-based, and they also evaluate the overall quality of the generated content. They often require ground truth data, but they don't require a model deployment as judge.
 - **Risk and safety metrics**: These metrics focus on identifying potential content risks and ensuring the safety of the generated content.
 
-:::image type="content" source="../media/evaluations/evaluate/testing-criteria.png" alt-text="Screenshot that shows how to add testing criteria." lightbox="../media/evaluations/evaluate/testing-criteria.png":::
-
 As you add your testing criteria, different metrics are going to be used as part of the evaluation. You can refer to the table for the complete list of metrics we offer support for in each scenario. For more in-depth information on metric definitions and how they're calculated, see [What are evaluators?](../concepts/observability.md#what-are-evaluators).
 
 | AI quality (AI assisted) | AI quality (NLP) | Risk and safety metrics |
@@ -143,15 +137,11 @@ For guidance on the specific data mapping requirements for each metric, refer to
 
 After you complete all the necessary configurations, you can provide an optional name for your evaluation. Then you can review and select **Submit** to submit the evaluation run.
 
-:::image type="content" source="../media/evaluations/evaluate/review-and-finish.png" alt-text="Screenshot that shows the review page to create a new evaluation." lightbox="../media/evaluations/evaluate/review-and-finish.png":::
-
-### Fine-tuned model evaluation
+### Model evaluation
 
 To create a new evaluation for your selected model deployment, you can use a GPT model to generate sample questions, or you can select from your established dataset collection.
 
-:::image type="content" source="../media/evaluations/evaluate/select-data-source.png" alt-text="Screenshot that shows how to select a data source in Create a new evaluation." lightbox="../media/evaluations/evaluate/select-data-source.png":::
-
-#### Configure test data for a fine-tuned model
+#### Configure test data for a model
 
 Set up the test dataset that's used for evaluation. This dataset is sent to the model to generate responses for assessment. You have two options for configuring your test data:
 
@@ -176,8 +166,6 @@ To configure your test criteria, select **Next**. As you select your criteria, m
 
 After you select the test criteria you want, you can review the evaluation, optionally change the name of the evaluation, and then select **Submit**. Go to the evaluation page to see the results.
 
-:::image type="content" source="../media/evaluations/evaluate/review-model-evaluation.png" alt-text="Screenshot that shows the Review evaluation option." lightbox="../media/evaluations/evaluate/review-model-evaluation.png":::
-
 > [!NOTE]
 > The generated dataset is saved to the project’s blob storage after the evaluation run is created.
 
@@ -189,8 +177,6 @@ The evaluator library also enables version management. You can compare different
 
 To use the evaluator library in Azure AI Foundry portal, go to your project's **Evaluation** page and select the **Evaluator library** tab.
 
-:::image type="content" source="../media/evaluations/evaluate/evaluator-library-list.png" alt-text="Screenshot that shows the page where you select evaluators from the evaluator library." lightbox="../media/evaluations/evaluate/evaluator-library-list.png":::
-
 You can select the evaluator name to see more details. You can see the name, description, and parameters, and check any files associated with the evaluator. Here are some examples of Microsoft-curated evaluators:
 
 - For performance and quality evaluators curated by Microsoft, you can view the annotation prompt on the details page. You can adapt these prompts to your own use case. Change the parameters or criteria according to your data and objectives in the Azure AI Evaluation SDK. For example, you can select **Groundedness-Evaluator** and check the Prompty file that shows how we calculate the metric.
 
@@ -8,7 +8,7 @@ ms.custom:
   - build-2024
   - ignite-2024
 ms.topic: how-to
-ms.date: 09/15/2025
+ms.date: 09/22/2025
 ms.reviewer: mithigpe
 ms.author: lagayhar
 author: lgayhardt
@@ -33,8 +33,6 @@ In this article, you learn how to:
 
 After you submit an evaluation, locate the run on the **Evaluation** page. Filter or adjust columns to focus on runs of interest. Review high‑level metrics at a glance before drilling in.
 
-:::image type="content" source="../media/evaluations/view-results/evaluation-run-list.png" alt-text="Screenshot that shows the evaluation run list." lightbox="../media/evaluations/view-results/evaluation-run-list.png":::
-
 > [!TIP]
 > You can view an evaluation run with any version of the `promptflow-evals` SDK or `azure-ai-evaluation` versions 1.0.0b1, 1.0.0b2, 1.0.0b3. Enable the **Show all runs** toggle to locate the run.
 
@@ -64,6 +62,7 @@ In the **Metric dashboard** section, aggregate views are broken down by metrics
 Use the table under the dashboard to inspect each data sample. Sort by a metric to surface worst‑performing samples and identify systematic gaps (incorrect results, safety failures, latency). Use search to cluster related failure topics. Apply column customization to focus on key metrics.
 
 Typical actions:
+
 - Filter for low scores to detect recurring patterns.
 - Adjust prompts or fine-tune when systemic gaps appear.
 - Export for offline analysis.
 
@@ -4,7 +4,7 @@ titleSuffix: Azure AI Foundry
 description:  Learn how to create and configure your storage account for Azure AI Foundry evaluations.
 ms.service: azure-ai-foundry
 ms.topic: how-to
-ms.date: 08/31/2025
+ms.date: 09/22/2025
 ms.reviewer: gregharen
 ms.author: lagayhar
 author: lgayhardt