Skip to content

Commit 67d833b

Browse files
committed
initial review of benchmark articles
1 parent b2ad56a commit 67d833b

File tree

2 files changed

+51
-61
lines changed

2 files changed

+51
-61
lines changed

articles/ai-foundry/concepts/model-benchmarks.md

Lines changed: 8 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@ manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom:
88
- ai-learning-hub
9-
- ignite-2024
109
ms.topic: concept-article
11-
ms.date: 11/11/2024
10+
ms.date: 04/03/2025
1211
ms.reviewer: changliu2
1312
ms.author: mopeakande
1413
author: msakande
@@ -18,50 +17,17 @@ author: msakande
1817

1918
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
2019

21-
In Azure AI Foundry portal, we offer a wide range of models in [model catalog](../how-to
22-
/model-catalog-overview.md) for your generative AI applications. To streamline your model selection experience, now you can leverage our model leaderboards backed by industry-standard benchmarks to find the best model for your custom AI solution. Within [model leaderboards](https://aka.ms/model-leaderboards) page, you can compare models available on Foundry using:
23-
- [Quality, cost, and performance leaderboards](#quality-cost-and-performance-leaderboards) to quickly identify the model leaders in a single criterion;
24-
- [Trade-off charts](#trade-off-charts) to see how models perform in quality versus cost;
25-
- [Leaderboards by scenario](#leaderboards-by-scenario) to find the best leaderboards that suites your scenario.
2620

27-
Whenever you find a model to your liking, you can simply select a model and zoom into [detailed benchmarking results](../how-to
28-
/benchmark-model-in-catalog.md) of individual models within the model catalog. Once you find a model to your liking, you can go and deploy your model, try it in playgorund, or evaluate it on your own data. Whether you already have models in mind or you're exploring models, model leaderboards in Azure AI Foundry empowers you to make data-driven decisions with a streamlined, intuitive experience for model selection.
21+
Model leaderboards in Azure AI Foundry portal allow you to streamline the model selection process in the Azure AI Foundry [model catalog](../how-to/model-catalog-overview.md). The model leaderboards, backed by industry-standard benchmarks can help you to find the best model for your custom AI solution. From the model leaderboards section of the model catalog, you can [browse leaderboards](https://aka.ms/model-leaderboards) to compare available models as follows:
22+
- [Quality, cost, and performance leaderboards](#quality-cost-and-performance-leaderboards) to quickly identify the model leaders along a single metric (quality, cost, or throughput);
23+
- [Trade-off charts](#trade-off-charts) to see how models perform on one metric versus another, such as quality versus cost;
24+
- [Leaderboards by scenario](#leaderboards-by-scenario) to find the best leaderboards that suite your scenario.
2925

30-
## Quality, cost, and performance leaderboards
26+
Whenever you find a model to your liking, you can select it and zoom into the [detailed benchmarking results](../how-to
27+
/benchmark-model-in-catalog.md) of the model within the model catalog. If satisfied with the model, you can deploy it, try it in the playgorund, or evaluate it on your data. The leaderboards support benchmarking across text language models (large language models (LLMs) and small language models (SLMs)) and embedding models.
3128

32-
From model catalog landing page, you will see the top 3 model leaders in [quality](#quality), [cost](#cost), and [performance](#cost) criteria.
33-
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-entry.png" alt-text="Screenshot showing the entry point from model catalog into model leaderboards." lightbox="../media/how-to/model-benchmarks/leaderboard-entry.png":::
3429

35-
Wherever you are in your selection journey, you can select a model to you liking to check out more details:
36-
37-
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-entry-select-model.png" alt-text="Screenshot showing the selected model from entry point of leaderboards on model catalog." lightbox="../media/how-to/model-benchmarks/leaderboard-entry-select-model.png":::
38-
39-
You can select "Browse leaderboards" to see the full suite of leaderboards we offer. [Quality](#quality) is the most common criterion for model selection:
40-
41-
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-quality.png" alt-text="Screenshot showing the quality leaderboards." lightbox="../media/how-to/model-benchmarks/leaderboard-quality.png":::
42-
43-
Then comes [cost](#cost) and [performance](#cost) leaderboards:
44-
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-highlights.png" alt-text="Screenshot showing the highlighted bar charts for quality, cost, and performance leaders." lightbox="../media/how-to/model-benchmarks/leaderboard-highlights.png":::
45-
46-
47-
## Quality, cost, and performance trade-off charts
48-
49-
You may find that the most high-quality model may not be the cheapest model, and you need to make trade-offs in among quality, cost, and performance criteria, for example, you may care more about cost than quality. In the trade-off charts, you can see how models perform in these criteria among others. You can also select or deselect models, toggle between charts, and even more metrics in "Compare between metrics"
50-
51-
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-trade-off.png" alt-text="Screenshot showing the trade-off charts in quality, cost, and performance." lightbox="../media/how-to/model-benchmarks/leaderboard-trade-off.png":::
52-
53-
54-
## Quality leaderboards by scenario
55-
56-
You may have a specific scenario that require certain model capabilities. For example, you are building a question-and-answering chatbot that require good question-and-answering and reasoning capabilities. You can find it useful to compare models in these leaderboards backed by capability-specific benchmarks.
57-
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-by-scenario.png" alt-text="Screenshot showing the quality leaderboards by scenarios." lightbox="../media/how-to/model-benchmarks/leaderboard-by-scenario.png":::
58-
59-
We support both text language models and embedding models.
60-
61-
- Benchmarks across large language models (LLMs) and small language models (SLMs)
62-
- Benchmarks across embedding models
63-
64-
## Benchmarking of LLMs and SLMs
30+
## Benchmarking of large and small language models
6531

6632
Model benchmarks assess LLMs and SLMs across the following categories: quality, performance, and cost. The benchmarks are updated regularly as new datasets and associated metrics are added to existing models, and as new models are added to the model catalog.
6733

articles/ai-foundry/how-to/benchmark-model-in-catalog.md

Lines changed: 43 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,77 @@
11
---
2-
title: How to use model benchmarking in Azure AI Foundry portal
2+
title: Benchmark models in the model leaderboard of Azure AI Foundry portal
33
titleSuffix: Azure AI Foundry
4-
description: In this article, you learn to compare benchmarks across models and datasets, using the model benchmarks tool in Azure AI Foundry portal.
4+
description: In this article, you learn to compare benchmarks across models and datasets, using the model leaderboard and the benchmarks feature in Azure AI Foundry portal.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom:
88
- ai-learning-hub
9-
- ignite-2024
109
ms.topic: how-to
11-
ms.date: 11/06/2024
12-
ms.reviewer: jcioffi
10+
ms.date: 04/03/2025
11+
ms.reviewer: changliu2
1312
ms.author: mopeakande
1413
author: msakande
1514
---
1615

17-
# How to benchmark models in Azure AI Foundry portal
16+
# Select models using the model leaderboard in Azure AI Foundry portal
1817

1918
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
2019

21-
In this article, you learn to compare benchmarks across models and datasets, using the model benchmarks tool in Azure AI Foundry portal. You also learn to analyze benchmarking results and to perform benchmarking with your data. Benchmarking can help you make informed decisions about which models meet the requirements for your particular use case or application.
20+
In this article, you learn to streamline your model selection process in the Azure AI Foundry [model catalog](../how-to/model-catalog-overview.md) by comparing models in the model leaderboards available in Azure AI Foundry portal. This comparison can help you make informed decisions about which models meet the requirements for your particular use case or application. You can compare models by viewing the following leaderboards:
21+
22+
- [Quality, cost, and performance leaderboards](#quality-cost-and-performance-leaderboards) to quickly identify the model leaders along a single metric (quality, cost, or throughput);
23+
- [Trade-off charts](#trade-off-charts) to see how models perform on one metric versus another, such as quality versus cost;
24+
- [Leaderboards by scenario](#leaderboards-by-scenario) to find the best leaderboards that suite your scenario.
25+
2226

2327
## Prerequisites
2428

2529
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
2630

2731
- An [Azure AI Foundry project](create-projects.md).
2832

29-
## Access model benchmarks through the model catalog
30-
31-
Azure AI supports model benchmarking for select models that are popular and most frequently used. Follow these steps to use detailed benchmarking results to compare and select models directly from the Azure AI Foundry model catalog:
33+
## Access model leaderboards
3234

3335
[!INCLUDE [open-catalog](../includes/open-catalog.md)]
3436

35-
4. Select the model you're interested in. For example, select **gpt-4o**. This action opens the model's overview page.
37+
4. Go to the **Model leaderboards** section of the model catalog. This section displays the top three model leaders ranked along [quality](#quality), [cost](#cost), and [performance](#cost). You can select any of these models to check out more details.
38+
39+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-entry-select-model.png" alt-text="Screenshot showing the selected model from entry point of leaderboards on model catalog." lightbox="../media/how-to/model-benchmarks/leaderboard-entry-select-model.png":::
40+
41+
1. From the **Model leaderboards** section of the model catalog, select **Browse leaderboards** to go to the [model leaderboards landing page](https://aka.ms/model-leaderboards) to see the full suite of leaderboards that are available. [Quality](#quality) is the most common criterion for model selection, followed by cost and performance.
42+
43+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-entry.png" alt-text="Screenshot showing the entry point from model catalog into model leaderboards." lightbox="../media/how-to/model-benchmarks/leaderboard-entry.png":::
44+
45+
### Compare models in the trade-off charts
46+
47+
Trade-off charts allow you to compare models based on the criteria that you care more about. Suppose you care more about cost than quality and you discover that the highest quality model isn't the cheapest model, you might need to make trade-offs among quality, cost, and performance criteria. In the trade-off charts, you can compare how models perform along two metrics at a glance.
48+
49+
1. Select the **Models selected** dropdown menu to add or remove models from the trade-off chart.
50+
1. Select the **Quality vs. Throughput** tab and the **Throughput vs Cost** tab to view those charts for your selected models.
51+
1. Select **Compare between metrics** to access more detailed results for each model.
3652

37-
> [!TIP]
38-
> From the model catalog, you can show the models that have benchmarking available by using the **Collections** filter and selecting **Benchmark results**. These models have a _benchmarks_ icon that looks like a histogram.
53+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-trade-off.png" alt-text="Screenshot showing the trade-off charts in quality, cost, and performance." lightbox="../media/how-to/model-benchmarks/leaderboard-trade-off.png":::
54+
55+
### View leaderboards by scenario
56+
57+
Suppose you have a scenario that requires certain model capabilities. For example, say you're building a question-and-answering chatbot that requires good question-and-answering and reasoning capabilities. You might find it useful to compare models in these leaderboards that are backed by capability-specific benchmarks.
58+
59+
:::image type="content" source="../media/how-to/model-benchmarks/leaderboard-by-scenario.png" alt-text="Screenshot showing the quality leaderboards by scenarios." lightbox="../media/how-to/model-benchmarks/leaderboard-by-scenario.png":::
60+
61+
62+
Once you've explored the leaderboards, you can decide on a model to use.
63+
64+
---
65+
66+
1. Return to the model catalog homepage and select a model to use. For example, select **gpt-4o**. This action opens the model's overview page.
3967

4068
1. Go to the **Benchmarks** tab to check the benchmark results for the model.
4169

4270
:::image type="content" source="../media/how-to/model-benchmarks/gpt4o-benchmark-tab.png" alt-text="Screenshot showing the benchmarks tab for gpt-4o." lightbox="../media/how-to/model-benchmarks/gpt4o-benchmark-tab.png":::
4371

44-
1. Return to the homepage of the model catalog.
45-
1. Select **Compare models** on the model catalog's homepage to explore models with benchmark support, view their metrics, and analyze the trade-offs among different models. This analysis can inform your selection of the model that best fits your requirements.
46-
47-
:::image type="content" source="../media/how-to/model-benchmarks/compare-models-model-catalog.png" alt-text="Screenshot showing the model comparison button on the model catalog main page." lightbox="../media/how-to/model-benchmarks/compare-models-model-catalog.png":::
72+
1. Select **Compare with more models**.
4873

49-
1. Select your desired tasks and specify the dimensions of interest, such as _AI Quality_ versus _Cost_, to evaluate the trade-offs among different models.
50-
1. You can switch to the **List view** to access more detailed results for each model.
74+
1. Switch to the **List view** to access more detailed results for each model.
5175

5276
:::image type="content" source="../media/how-to/model-benchmarks/compare-view.png" alt-text="Screenshot showing an example of benchmark comparison view." lightbox="../media/how-to/model-benchmarks/compare-view.png":::
5377

0 commit comments

Comments
 (0)