Skip to content

Commit 7026eb4

Browse files
authored
Merge pull request #274602 from MicrosoftDocs/main
5/7/2024 PM Publish
2 parents 0cf8f48 + 91273ac commit 7026eb4

File tree

95 files changed

+1867
-716
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+1867
-716
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Explore model benchmarks in Azure AI Studio
3+
titleSuffix: Azure AI Studio
4+
description: This article introduces benchmarking capabilities and the model benchmarks experience in Azure AI Studio.
5+
manager: scottpolly
6+
ms.service: azure-ai-studio
7+
ms.custom:
8+
ms.topic: how-to
9+
ms.date: 5/6/2024
10+
ms.reviewer: jcioffi
11+
ms.author: jcioffi
12+
author: jesscioffi
13+
---
14+
15+
# Model benchmarks
16+
17+
[!INCLUDE [Azure AI Studio preview](../includes/preview-ai-studio.md)]
18+
19+
In Azure AI Studio, you can compare benchmarks across models and datasets available in the industry to assess which one meets your business scenario. You can find Model benchmarks under **Get started** in the left side menu in Azure AI Studio.
20+
21+
:::image type="content" source="../media/explore/model-benchmarks-dashboard-view.png" alt-text="Screenshot of dashboard view graph of model accuracy." lightbox="../media/explore/model-benchmarks-dashboard-view.png":::
22+
23+
Model benchmarks help you make informed decisions about the sustainability of models and datasets prior to initiating any job. The benchmarks are a curated list of the best performing models for a given task, based on a comprehensive comparison of benchmarking metrics. Currently, Azure AI Studio provides benchmarks based on quality, via the metrics listed below.
24+
25+
| Metric | Description |
26+
|--------------|-------|
27+
| Accuracy |Accuracy scores are available at the dataset and the model levels. At the dataset level, the score is the average value of an accuracy metric computed over all examples in the dataset. The accuracy metric used is exact-match in all cases except for the *HumanEval* dataset that uses a `pass@1` metric. Exact match simply compares model generated text with the correct answer according to the dataset, reporting one if the generated text matches the answer exactly and zero otherwise. `Pass@1` measures the proportion of model solutions that pass a set of unit tests in a code generation task. At the model level, the accuracy score is the average of the dataset-level accuracies for each model.|
28+
| Coherence |Coherence evaluates how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language.|
29+
| Fluency |Fluency evaluates the language proficiency of a generative AI's predicted answer. It assesses how well the generated text adheres to grammatical rules, syntactic structures, and appropriate usage of vocabulary, resulting in linguistically correct and natural-sounding responses.|
30+
| GPTSimilarity|GPTSimilarity is a measure that quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model. It is calculated by first computing sentence-level embeddings using the embeddings API for both the ground truth and the model's prediction. These embeddings represent high-dimensional vector representations of the sentences, capturing their semantic meaning and context.|
31+
32+
The benchmarks are updated regularly as new metrics and datasets are added to existing models, and as new models are added to the model catalog.
33+
34+
### How the scores are calculated
35+
36+
The benchmark results originate from public datasets that are commonly used for language model evaluation. In most cases, the data is hosted in GitHub repositories maintained by the creators or curators of the data. Azure AI evaluation pipelines download data from their original sources, extract prompts from each example row, generate model responses, and then compute relevant accuracy metrics.
37+
38+
Prompt construction follows best practice for each dataset, set forth by the paper introducing the dataset and industry standard. In most cases, each prompt contains several examples of complete questions and answers, or "shots," to prime the model for the task. The evaluation pipelines create shots by sampling questions and answers from a portion of the data that is held out from evaluation.
39+
40+
### View options in the model benchmarks
41+
42+
These benchmarks encompass both a dashboard view and a list view of the data for ease of comparison, and helpful information that explains what the calculated metrics mean.
43+
44+
Dashboard view allows you to compare the scores of multiple models across datasets and tasks. You can view models side by side (horizontally along the x-axis) and compare their scores (vertically along the y-axis) for each metric.
45+
46+
You can filter the dashboard view by task, model collection, model name, dataset, and metric.
47+
48+
You can switch from dashboard view to list view by following these quick steps:
49+
1. Select the models you want to compare.
50+
2. Select **List** on the right side of the page.
51+
52+
:::image type="content" source="../media/explore/model-benchmarks-dashboard-filtered.png" alt-text="Screenshot of dashboard view graph with question answering filter applied and 'List' button identified." lightbox="../media/explore/model-benchmarks-dashboard-filtered.png":::
53+
54+
In list view you can find the following information:
55+
- Model name, description, version, and aggregate scores.
56+
- Benchmark datasets (such as AGIEval) and tasks (such as question answering) that were used to evaluate the model.
57+
- Model scores per dataset.
58+
59+
You can also filter the list view by task, model collection, model name, dataset, and metric.
60+
61+
:::image type="content" source="../media/explore/model-benchmarks-list-view.png" alt-text="Screenshot of list view table displaying accuracy metrics in an ordered list." lightbox="../media/explore/model-benchmarks-list-view.png":::
62+
63+
## Next steps
64+
65+
- [Explore Azure AI foundation models in Azure AI Studio](models-foundation-azure-ai.md)
66+
- [View and compare benchmarks in AI Studio](https://ai.azure.com/explore/benchmarks)
221 KB
Loading
215 KB
Loading
259 KB
Loading

articles/ai-studio/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ items:
4141
items:
4242
- name: Model catalog
4343
href: how-to/model-catalog.md
44+
- name: Model benchmarks
45+
href: how-to/model-benchmarks.md
4446
- name: Cohere models
4547
items:
4648
- name: Deploy Cohere Command models

articles/aks/enable-fips-nodes.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,23 @@ The Federal Information Processing Standard (FIPS) 140-2 is a US government stan
3333
>
3434
> FIPS-enabled node images may have different version numbers, such as kernel version, than images that aren't FIPS-enabled. The update cycle for FIPS-enabled node pools and node images may differ from node pools and images that aren't FIPS-enabled.
3535
36+
## Supported OS Versions
37+
You can create FIPS-enabled node pools on all supported OS types, Linux and Windows. However, not all OS versions support FIPS-enabled nodepools. After a new OS version is released, there is typically a waiting period before it is FIPS compliant.
38+
39+
The below table includes the supported OS versions:
40+
41+
|OS Type|OS SKU|FIPS Compliance|
42+
|--|--|--|
43+
|Linux|Ubuntu|Supported|
44+
|Linux|Azure Linux| Supported|
45+
|Windows|Windows Server 2019| Supported|
46+
|Windows| Windows Server 2022| Supported|
47+
48+
When requesting FIPS enabled Ubuntu, if the default Ubuntu version does not support FIPS, AKS will default to the most recent FIPS-supported version of Ubuntu. For example, Ubuntu 22.04 is default for Linux node pools. Since 22.04 does not currently support FIPS, AKS will default to Ubuntu 20.04 for Linux FIPS-enabled nodepools.
49+
50+
> [!NOTE]
51+
> Previously, you could use the GetOSOptions API to determine whether a given OS supported FIPS. The GetOSOptions API is now deprecated and it will no longer be included in new AKS API versions starting with 2024-05-01.
52+
3653
## Create a FIPS-enabled Linux node pool
3754

3855
1. Create a FIPS-enabled Linux node pool using the [`az aks nodepool add`][az-aks-nodepool-add] command with the `--enable-fips-image` parameter.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: 'Deploy an application that uses OpenAI on Azure App Service'
3+
description: Get started with OpenAI on Azure App Service
4+
author: jefmarti
5+
ms.author: jefmarti
6+
ms.date: 04/10/2024
7+
ms.topic: article
8+
zone_pivot_groups: app-service-openai
9+
---
10+
11+
# Deploy an application that uses OpenAI on Azure App Service
12+
13+
::: zone pivot="openai-dotnet"
14+
[!INCLUDE [deploy-intelligent-apps-linux-dotnet-pivot.md](includes/deploy-intelligent-apps/deploy-intelligent-apps-linux-dotnet-pivot.md)]
15+
::: zone-end
16+
17+
::: zone pivot="openai-python"
18+
[!INCLUDE [deploy-intelligent-apps-linux-python-pivot.md](includes/deploy-intelligent-apps/deploy-intelligent-apps-linux-python-pivot.md)]
19+
::: zone-end
20+
21+
:::zone pivot="openai-java"
22+
[!INCLUDE [deploy-intelligent-apps-linux-java-pivot.md](includes/deploy-intelligent-apps/deploy-intelligent-apps-linux-java-pivot.md)]
23+
::: zone-end

0 commit comments

Comments
 (0)