|
| 1 | +--- |
| 2 | +title: How to use model benchmarking in Azure AI Studio |
| 3 | +titleSuffix: Azure AI Studio |
| 4 | +description: In this article, you learn to compare benchmarks across models and datasets, using the model benchmarks tool in Azure AI Studio. |
| 5 | +manager: scottpolly |
| 6 | +ms.service: azure-ai-studio |
| 7 | +ms.custom: |
| 8 | + - ai-learning-hub |
| 9 | +ms.topic: how-to |
| 10 | +ms.date: 11/06/2024 |
| 11 | +ms.reviewer: jcioffi |
| 12 | +ms.author: mopeakande |
| 13 | +author: msakande |
| 14 | +--- |
| 15 | + |
| 16 | +# How to benchmark models in Azure AI Studio |
| 17 | + |
| 18 | +[!INCLUDE [feature-preview](../includes/feature-preview.md)] |
| 19 | + |
| 20 | +In this article, you learn to compare benchmarks across models and datasets, using the model benchmarks tool in Azure AI Studio. You also learn to analyze benchmarking results and to perform benchmarking with your data. Benchmarking can help you make informed decisions about which models meet the requirements for your particular use case or application. |
| 21 | + |
| 22 | +## Prerequisites |
| 23 | + |
| 24 | +- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin. |
| 25 | + |
| 26 | +- An [Azure AI Studio project](create-projects.md). |
| 27 | + |
| 28 | +## Access model benchmarks through the model catalog |
| 29 | + |
| 30 | +Azure AI supports model benchmarking for select models that are popular and most frequently used. Follow these steps to use detailed benchmarking results to compare and select models directly from the AI Studio model catalog: |
| 31 | + |
| 32 | +[!INCLUDE [open-catalog](../includes/open-catalog.md)] |
| 33 | + |
| 34 | +4. Select the model you're interested in. For example, select **gpt-4o**. This action opens the model's overview page. |
| 35 | + |
| 36 | + > [!TIP] |
| 37 | + > From the model catalog, you can show the models that have benchmarking available by using the **Collections** filter and selecting **Benchmark results**. These models have a _benchmarks_ icon that looks like a histogram. |
| 38 | +
|
| 39 | +1. Go to the **Benchmarks** tab to check the benchmark results for the model. |
| 40 | + |
| 41 | + :::image type="content" source="../media/how-to/model-benchmarks/gpt4o-benchmark-tab.png" alt-text="Screenshot showing the benchmarks tab for gpt-4o." lightbox="../media/how-to/model-benchmarks/gpt4o-benchmark-tab.png"::: |
| 42 | + |
| 43 | +1. Return to the homepage of the model catalog. |
| 44 | +1. Select **Compare models** on the model catalog's homepage to explore models with benchmark support, view their metrics, and analyze the trade-offs among different models. This analysis can inform your selection of the model that best fits your requirements. |
| 45 | + |
| 46 | + :::image type="content" source="../media/how-to/model-benchmarks/compare-models-model-catalog.png" alt-text="Screenshot showing the model comparison button on the model catalog main page." lightbox="../media/how-to/model-benchmarks/compare-models-model-catalog.png"::: |
| 47 | + |
| 48 | +1. Select your desired tasks and specify the dimensions of interest, such as _AI Quality_ versus _Cost_, to evaluate the trade-offs among different models. |
| 49 | +1. You can switch to the **List view** to access more detailed results for each model. |
| 50 | + |
| 51 | + :::image type="content" source="../media/how-to/model-benchmarks/compare-view.png" alt-text="Screenshot showing an example of benchmark comparison view." lightbox="../media/how-to/model-benchmarks/compare-view.png"::: |
| 52 | + |
| 53 | +## Analyze benchmark results |
| 54 | + |
| 55 | +When you're in the "Benchmarks" tab for a specific model, you can gather extensive information to better understand and interpret the benchmark results, including: |
| 56 | + |
| 57 | +- **High-level aggregate scores**: These scores for AI quality, cost, latency, and throughput provide a quick overview of the model's performance. |
| 58 | +- **Comparative charts**: These charts display the model's relative position compared to related models. |
| 59 | +- **Metric comparison table**: This table presents detailed results for each metric. |
| 60 | + |
| 61 | + :::image type="content" source="../media/how-to/model-benchmarks/gpt4o-benchmark-tab-expand.png" alt-text="Screenshot showing benchmarks tab for gpt-4o." lightbox="../media/how-to/model-benchmarks/gpt4o-benchmark-tab-expand.png"::: |
| 62 | + |
| 63 | +By default, AI Studio displays an average index across various metrics and datasets to provide a high-level overview of model performance. |
| 64 | + |
| 65 | +To access benchmark results for a specific metric and dataset: |
| 66 | + |
| 67 | +1. Select the expand button on the chart. The pop-up comparison chart reveals detailed information and offers greater flexibility for comparison. |
| 68 | + |
| 69 | + :::image type="content" source="../media/how-to/model-benchmarks/expand-to-detailed-metric.png" alt-text="Screenshot showing the expand button to select for a detailed comparison chart." lightbox="../media/how-to/model-benchmarks/expand-to-detailed-metric.png"::: |
| 70 | + |
| 71 | +1. Select the metric of interest and choose different datasets, based on your specific scenario. For more detailed definitions of the metrics and descriptions of the public datasets used to calculate results, select **Read more**. |
| 72 | + |
| 73 | + :::image type="content" source="../media/how-to/model-benchmarks/comparison-chart-per-metric-data.png" alt-text="Screenshot showing the comparison chart with a specific metric and dataset." lightbox="../media/how-to/model-benchmarks/comparison-chart-per-metric-data.png"::: |
| 74 | + |
| 75 | + |
| 76 | +## Evaluate benchmark results with your data |
| 77 | + |
| 78 | +The previous sections showed the benchmark results calculated by Microsoft, using public datasets. However, you can try to regenerate the same set of metrics with your data. |
| 79 | + |
| 80 | +1. Return to the **Benchmarks** tab in the model card. |
| 81 | +1. Select **Try with your own data** to evaluate the model with your data. Evaluation on your data helps you see how the model performs in your particular scenarios. |
| 82 | + |
| 83 | + :::image type="content" source="../media/how-to/model-benchmarks/try-with-your-own-data.png" alt-text="Screenshot showing the button to select for evaluating with your own data." lightbox="../media/how-to/model-benchmarks/try-with-your-own-data.png"::: |
| 84 | + |
| 85 | +## Related content |
| 86 | + |
| 87 | +- [Model benchmarks in Azure AI Studio](../concepts/model-benchmarks.md) |
| 88 | +- [How to evaluate generative AI apps with Azure AI Studio](evaluate-generative-ai-app.md) |
| 89 | +- [How to view evaluation results in Azure AI Studio](evaluate-results.md) |
0 commit comments