Skip to content

Commit b55f9cb

Browse files
authored
Merge pull request #7218 from lgayhardt/evalprotal0825
Eval portal doc freshness
2 parents 61b3139 + b17eeb6 commit b55f9cb

9 files changed

+8
-23
lines changed

articles/ai-foundry/how-to/benchmark-model-in-catalog.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ To access benchmark results for a specific metric and dataset:
107107
The previous sections showed the benchmark results calculated by Microsoft, using public datasets. However, you can try to regenerate the same set of metrics with your data.
108108

109109
1. Return to the **Benchmarks** tab in the model card.
110-
1. Select **Try with your own data** to [evaluate the model with your data](evaluate-generative-ai-app.md#fine-tuned-model-evaluation). Evaluation on your data helps you see how the model performs in your particular scenarios.
110+
1. Select **Try with your own data** to [evaluate the model with your data](evaluate-generative-ai-app.md#model-evaluation). Evaluation on your data helps you see how the model performs in your particular scenarios.
111111

112112
:::image type="content" source="../media/how-to/model-benchmarks/try-with-your-own-data.png" alt-text="Screenshot showing the button to select for evaluating with your own data." lightbox="../media/how-to/model-benchmarks/try-with-your-own-data.png":::
113113

articles/ai-foundry/how-to/evaluate-generative-ai-app.md

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Evaluate your generative AI models and applications by using Azure
55
ms.service: azure-ai-foundry
66
ms.custom: ignite-2023, references_regions, build-2024, ignite-2024
77
ms.topic: how-to
8-
ms.date: 05/19/2025
8+
ms.date: 09/22/2025
99
ms.reviewer: mithigpe
1010
ms.author: lagayhar
1111
author: lgayhardt
@@ -32,8 +32,6 @@ An evaluation run allows you to generate metric outputs for each data row in you
3232

3333
From the collapsible left menu, select **Evaluation** > **Create a new evaluation**.
3434

35-
:::image type="content" source="../media/evaluations/evaluate/create-new-evaluation.png" alt-text="Screenshot of the button to create a new evaluation." lightbox="../media/evaluations/evaluate/create-new-evaluation.png":::
36-
3735
### From the model catalog page
3836

3937
1. From the collapsible left menu, select **Model catalog**.
@@ -47,11 +45,9 @@ From the collapsible left menu, select **Evaluation** > **Create a new evaluatio
4745

4846
When you start an evaluation from the **Evaluate** page, you first need to choose the evaluation target. By specifying the appropriate evaluation target, we can tailor the evaluation to the specific nature of your application, ensuring accurate and relevant metrics. We support two types of evaluation targets:
4947

50-
- **Fine-tuned model**: This choice evaluates the output generated by your selected model and user-defined prompt.
48+
- **Model**: This choice evaluates the output generated by your selected model and user-defined prompt.
5149
- **Dataset**: Your model-generated outputs are already in a test dataset.
5250

53-
:::image type="content" source="../media/evaluations/evaluate/select-evaluation-target.png" alt-text="Screenshot of the evaluation target selection." lightbox="../media/evaluations/evaluate/select-evaluation-target.png":::
54-
5551
#### Configure test data
5652

5753
When you enter the evaluation creation wizard, you can select from preexisting datasets or upload a new dataset to evaluate. The test dataset needs to have the model-generated outputs to be used for evaluation. A preview of your test data is shown on the right pane.
@@ -72,8 +68,6 @@ We support three types of metrics curated by Microsoft to facilitate a comprehen
7268
- **AI quality (NLP)**: These natural language processing (NLP) metrics are mathematical-based, and they also evaluate the overall quality of the generated content. They often require ground truth data, but they don't require a model deployment as judge.
7369
- **Risk and safety metrics**: These metrics focus on identifying potential content risks and ensuring the safety of the generated content.
7470

75-
:::image type="content" source="../media/evaluations/evaluate/testing-criteria.png" alt-text="Screenshot that shows how to add testing criteria." lightbox="../media/evaluations/evaluate/testing-criteria.png":::
76-
7771
As you add your testing criteria, different metrics are going to be used as part of the evaluation. You can refer to the table for the complete list of metrics we offer support for in each scenario. For more in-depth information on metric definitions and how they're calculated, see [What are evaluators?](../concepts/observability.md#what-are-evaluators).
7872

7973
| AI quality (AI assisted) | AI quality (NLP) | Risk and safety metrics |
@@ -143,15 +137,11 @@ For guidance on the specific data mapping requirements for each metric, refer to
143137

144138
After you complete all the necessary configurations, you can provide an optional name for your evaluation. Then you can review and select **Submit** to submit the evaluation run.
145139

146-
:::image type="content" source="../media/evaluations/evaluate/review-and-finish.png" alt-text="Screenshot that shows the review page to create a new evaluation." lightbox="../media/evaluations/evaluate/review-and-finish.png":::
147-
148-
### Fine-tuned model evaluation
140+
### Model evaluation
149141

150142
To create a new evaluation for your selected model deployment, you can use a GPT model to generate sample questions, or you can select from your established dataset collection.
151143

152-
:::image type="content" source="../media/evaluations/evaluate/select-data-source.png" alt-text="Screenshot that shows how to select a data source in Create a new evaluation." lightbox="../media/evaluations/evaluate/select-data-source.png":::
153-
154-
#### Configure test data for a fine-tuned model
144+
#### Configure test data for a model
155145

156146
Set up the test dataset that's used for evaluation. This dataset is sent to the model to generate responses for assessment. You have two options for configuring your test data:
157147

@@ -176,8 +166,6 @@ To configure your test criteria, select **Next**. As you select your criteria, m
176166

177167
After you select the test criteria you want, you can review the evaluation, optionally change the name of the evaluation, and then select **Submit**. Go to the evaluation page to see the results.
178168

179-
:::image type="content" source="../media/evaluations/evaluate/review-model-evaluation.png" alt-text="Screenshot that shows the Review evaluation option." lightbox="../media/evaluations/evaluate/review-model-evaluation.png":::
180-
181169
> [!NOTE]
182170
> The generated dataset is saved to the project’s blob storage after the evaluation run is created.
183171
@@ -189,8 +177,6 @@ The evaluator library also enables version management. You can compare different
189177

190178
To use the evaluator library in Azure AI Foundry portal, go to your project's **Evaluation** page and select the **Evaluator library** tab.
191179

192-
:::image type="content" source="../media/evaluations/evaluate/evaluator-library-list.png" alt-text="Screenshot that shows the page where you select evaluators from the evaluator library." lightbox="../media/evaluations/evaluate/evaluator-library-list.png":::
193-
194180
You can select the evaluator name to see more details. You can see the name, description, and parameters, and check any files associated with the evaluator. Here are some examples of Microsoft-curated evaluators:
195181

196182
- For performance and quality evaluators curated by Microsoft, you can view the annotation prompt on the details page. You can adapt these prompts to your own use case. Change the parameters or criteria according to your data and objectives in the Azure AI Evaluation SDK. For example, you can select **Groundedness-Evaluator** and check the Prompty file that shows how we calculate the metric.

articles/ai-foundry/how-to/evaluate-results.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.custom:
88
- build-2024
99
- ignite-2024
1010
ms.topic: how-to
11-
ms.date: 09/15/2025
11+
ms.date: 09/22/2025
1212
ms.reviewer: mithigpe
1313
ms.author: lagayhar
1414
author: lgayhardt
@@ -33,8 +33,6 @@ In this article, you learn how to:
3333

3434
After you submit an evaluation, locate the run on the **Evaluation** page. Filter or adjust columns to focus on runs of interest. Review high‑level metrics at a glance before drilling in.
3535

36-
:::image type="content" source="../media/evaluations/view-results/evaluation-run-list.png" alt-text="Screenshot that shows the evaluation run list." lightbox="../media/evaluations/view-results/evaluation-run-list.png":::
37-
3836
> [!TIP]
3937
> You can view an evaluation run with any version of the `promptflow-evals` SDK or `azure-ai-evaluation` versions 1.0.0b1, 1.0.0b2, 1.0.0b3. Enable the **Show all runs** toggle to locate the run.
4038
@@ -64,6 +62,7 @@ In the **Metric dashboard** section, aggregate views are broken down by metrics
6462
Use the table under the dashboard to inspect each data sample. Sort by a metric to surface worst‑performing samples and identify systematic gaps (incorrect results, safety failures, latency). Use search to cluster related failure topics. Apply column customization to focus on key metrics.
6563

6664
Typical actions:
65+
6766
- Filter for low scores to detect recurring patterns.
6867
- Adjust prompts or fine-tune when systemic gaps appear.
6968
- Export for offline analysis.

articles/ai-foundry/how-to/evaluations-storage-account.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ titleSuffix: Azure AI Foundry
44
description: Learn how to create and configure your storage account for Azure AI Foundry evaluations.
55
ms.service: azure-ai-foundry
66
ms.topic: how-to
7-
ms.date: 08/31/2025
7+
ms.date: 09/22/2025
88
ms.reviewer: gregharen
99
ms.author: lagayhar
1010
author: lgayhardt
28.4 KB
Loading
-63.3 KB
Loading
-65.4 KB
Loading
-579 KB
Loading
109 KB
Loading

0 commit comments

Comments
 (0)