Skip to content

Commit 919c874

Browse files
authored
Merge pull request #3888 from changliu2/leaderboard-march-release
Leaderboard march release
2 parents fa42a8b + 0bb88c1 commit 919c874

File tree

9 files changed

+53
-23
lines changed

9 files changed

+53
-23
lines changed

articles/ai-foundry/concepts/model-benchmarks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Explore model leaderboards in Azure AI Foundry portal
33
titleSuffix: Azure AI Foundry
4-
description: This article introduces benchmarking capabilities and the model benchmarks experience in Azure AI Foundry portal.
4+
description: This article introduces benchmarking capabilities and model leaderboards (preview) in Azure AI Foundry portal.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom:
@@ -155,5 +155,5 @@ Prompt construction follows best practices for each dataset, as specified by the
155155

156156
## Related content
157157

158-
- [How to benchmark models in Azure AI Foundry portal](../how-to/benchmark-model-in-catalog.md)
158+
- [Compare and select models using the model leaderboard in Azure AI Foundry portal](../how-to/benchmark-model-in-catalog.md)
159159
- [Model catalog and collections in Azure AI Foundry portal](../how-to/model-catalog-overview.md)

articles/ai-foundry/how-to/benchmark-model-in-catalog.md

Lines changed: 50 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,83 @@
11
---
2-
title: How to use model benchmarking in Azure AI Foundry portal
2+
title: Benchmark models in the model leaderboard of Azure AI Foundry portal
33
titleSuffix: Azure AI Foundry
4-
description: In this article, you learn to compare benchmarks across models and datasets, using the model benchmarks tool in Azure AI Foundry portal.
4+
description: In this article, you learn to compare benchmarks across models and datasets, using the model leaderboards (preview) and the benchmarks feature in Azure AI Foundry portal.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom:
88
- ai-learning-hub
9-
- ignite-2024
109
ms.topic: how-to
11-
ms.date: 11/06/2024
12-
ms.reviewer: jcioffi
10+
ms.date: 04/07/2025
11+
ms.reviewer: changliu2
12+
reviewer: changliu2
1313
ms.author: mopeakande
1414
author: msakande
1515
---
1616

17-
# How to benchmark models in Azure AI Foundry portal
17+
# Compare and select models using the model leaderboard in Azure AI Foundry portal (preview)
1818

1919
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
2020

21-
In this article, you learn to compare benchmarks across models and datasets, using the model benchmarks tool in Azure AI Foundry portal. You also learn to analyze benchmarking results and to perform benchmarking with your data. Benchmarking can help you make informed decisions about which models meet the requirements for your particular use case or application.
21+
In this article, you learn to streamline your model selection process in the Azure AI Foundry [model catalog](../how-to/model-catalog-overview.md) by comparing models in the model leaderboards (preview) available in Azure AI Foundry portal. This comparison can help you make informed decisions about which models meet the requirements for your particular use case or application. You can compare models by viewing the following leaderboards:
22+
23+
- [Quality, cost, and performance leaderboards](#access-model-leaderboards) to quickly identify the model leaders along a single metric (quality, cost, or throughput);
24+
- [Trade-off charts](#compare-models-in-the-trade-off-charts) to see how models perform on one metric versus another, such as quality versus cost;
25+
- [Leaderboards by scenario](#view-leaderboards-by-scenario) to find the best leaderboards that suite your scenario.
26+
2227

2328
## Prerequisites
2429

2530
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
2631

2732
- An [Azure AI Foundry project](create-projects.md).
2833

29-
## Access model benchmarks through the model catalog
30-
31-
Azure AI supports model benchmarking for select models that are popular and most frequently used. Follow these steps to use detailed benchmarking results to compare and select models directly from the Azure AI Foundry model catalog:
34+
## Access model leaderboards
3235

3336
[!INCLUDE [open-catalog](../includes/open-catalog.md)]
3437

35-
4. Select the model you're interested in. For example, select **gpt-4o**. This action opens the model's overview page.
38+
4. Go to the **Model leaderboards** section of the model catalog. This section displays the top three model leaders ranked along [quality](../concepts/model-benchmarks.md#quality), [cost](../concepts/model-benchmarks.md#cost), and [performance](../concepts/model-benchmarks.md#performance). You can select any of these models to check out more details.
39+
40+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-entry-select-model.png" alt-text="Screenshot showing the selected model from entry point of leaderboards on the model catalog homepage." lightbox="../media/how-to/model-benchmarks/leaderboard-entry-select-model.png":::
41+
42+
1. From the **Model leaderboards** section of the model catalog, select **Browse leaderboards** to go to the [model leaderboards landing page](https://aka.ms/model-leaderboards) to see the full suite of leaderboards that are available.
43+
44+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-entry.png" alt-text="Screenshot showing the entry point from model catalog into model leaderboards." lightbox="../media/how-to/model-benchmarks/leaderboard-entry.png":::
45+
46+
The homepage displays leaderboard highlights for model selection criteria. Quality is the most common criterion for model selection, followed by cost and performance.
47+
48+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-highlights.png" alt-text="Screenshot showing the highlighted leaderboards in quality, cost, and performance." lightbox="../media/how-to/model-benchmarks/leaderboard-highlights.png":::
49+
50+
51+
### Compare models in the trade-off charts
3652

37-
> [!TIP]
38-
> From the model catalog, you can show the models that have benchmarking available by using the **Collections** filter and selecting **Benchmark results**. These models have a _benchmarks_ icon that looks like a histogram.
53+
Trade-off charts allow you to compare models based on the criteria that you care more about. Suppose you care more about cost than quality and you discover that the highest quality model isn't the cheapest model, you might need to make trade-offs among quality, cost, and performance criteria. In the trade-off charts, you can compare how models perform along two metrics at a glance.
54+
55+
1. Select the **Models selected** dropdown menu to add or remove models from the trade-off chart.
56+
1. Select the **Quality vs. Throughput** tab and the **Throughput vs Cost** tab to view those charts for your selected models.
57+
1. Select **Compare between metrics** to access more detailed results for each model.
58+
59+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-trade-off.png" alt-text="Screenshot showing the trade-off charts in quality, cost, and performance." lightbox="../media/how-to/model-benchmarks/leaderboard-trade-off.png":::
60+
61+
### View leaderboards by scenario
62+
63+
Suppose you have a scenario that requires certain model capabilities. For example, say you're building a question-and-answering chatbot that requires good question-and-answering and reasoning capabilities. You might find it useful to compare models in these leaderboards that are backed by capability-specific benchmarks.
64+
65+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-by-scenario.png" alt-text="Screenshot showing the quality leaderboards by scenarios." lightbox="../media/how-to/model-benchmarks/leaderboard-by-scenario.png":::
66+
67+
68+
Once you've explored the leaderboards, you can decide on a model to use.
69+
70+
## View benchmarks from the model card
71+
72+
1. Select a model to your liking and select **Model details**. You can select the model from one of the displayed leaderboards, such as the quality leaderboard at the top of the model leaderboards homepage. For this example, select **gpt-4o**. This action opens the model's overview page.
3973

4074
1. Go to the **Benchmarks** tab to check the benchmark results for the model.
4175

4276
:::image type="content" source="../media/how-to/model-benchmarks/gpt4o-benchmark-tab.png" alt-text="Screenshot showing the benchmarks tab for gpt-4o." lightbox="../media/how-to/model-benchmarks/gpt4o-benchmark-tab.png":::
4377

44-
1. Return to the homepage of the model catalog.
45-
1. Select **Compare models** on the model catalog's homepage to explore models with benchmark support, view their metrics, and analyze the trade-offs among different models. This analysis can inform your selection of the model that best fits your requirements.
46-
47-
:::image type="content" source="../media/how-to/model-benchmarks/compare-models-model-catalog.png" alt-text="Screenshot showing the model comparison button on the model catalog main page." lightbox="../media/how-to/model-benchmarks/compare-models-model-catalog.png":::
78+
1. Select **Compare with more models**.
4879

49-
1. Select your desired tasks and specify the dimensions of interest, such as _AI Quality_ versus _Cost_, to evaluate the trade-offs among different models.
50-
1. You can switch to the **List view** to access more detailed results for each model.
80+
1. Switch to the **List view** to access more detailed results for each model.
5181

5282
:::image type="content" source="../media/how-to/model-benchmarks/compare-view.png" alt-text="Screenshot showing an example of benchmark comparison view." lightbox="../media/how-to/model-benchmarks/compare-view.png":::
5383

@@ -85,6 +115,6 @@ The previous sections showed the benchmark results calculated by Microsoft, usin
85115

86116
## Related content
87117

88-
- [Model benchmarks in Azure AI Foundry portal](../concepts/model-benchmarks.md)
118+
- [Model leaderboards in Azure AI Foundry portal](../concepts/model-benchmarks.md)
89119
- [How to evaluate generative AI apps with Azure AI Foundry](evaluate-generative-ai-app.md)
90120
- [How to view evaluation results in Azure AI Foundry portal](evaluate-results.md)
85.5 KB
Loading
227 KB
Loading
208 KB
Loading
68.8 KB
Loading
145 KB
Loading
131 KB
Loading

articles/ai-foundry/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ items:
4646
items:
4747
- name: Model leaderboards
4848
href: concepts/model-benchmarks.md
49-
- name: How to use model benchmarking
49+
- name: Compare models in leaderboards
5050
href: how-to/benchmark-model-in-catalog.md
5151
- name: Model deployment in Azure AI Foundry
5252
items:

0 commit comments

Comments
 (0)