Skip to content

Commit cef9e69

Browse files
committed
Merge branch 'main' into release-build-azure-ai-speech
2 parents 57e6604 + 0eb53d8 commit cef9e69

File tree

119 files changed

+2101
-822
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+2101
-822
lines changed

articles/ai-services/language-service/conversational-language-understanding/concepts/best-practices.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,3 +216,38 @@ curl --request POST \
216216
"targetResourceRegion": "<target-region>"
217217
}'
218218
```
219+
220+
221+
## Addressing out of domain utterances
222+
223+
Customers can use the new recipe version '2024-06-01-preview' in case the model has poor AIQ on out of domain utterances. An example of this with the default recipe can be like the below where the model has 3 intents Sports, QueryWeather and Alarm. The test utterances are out of domain utterances and the model classifies them as InDomain with a relatively high confidence score.
224+
225+
| Text | Predicted intent | Confidence score |
226+
|----|----|----|
227+
| "*Who built the Eiffel Tower?*" | `Sports` | 0.90 |
228+
| "*Do I look good to you today?*" | `QueryWeather` | 1.00 |
229+
| "*I hope you have a good evening.*" | `Alarm` | 0.80 |
230+
231+
To address this, use the `2024-06-01-preview` configuration version that is built specifically to address this issue while also maintaining reasonably good quality on In Domain utterances.
232+
233+
```console
234+
curl --location 'https://<your-resource>.cognitiveservices.azure.com/language/authoring/analyze-conversations/projects/<your-project>/:train?api-version=2022-10-01-preview' \
235+
--header 'Ocp-Apim-Subscription-Key: <your subscription key>' \
236+
--header 'Content-Type: application/json' \
237+
--data '{
238+
      "modelLabel": "<modelLabel>",
239+
      "trainingMode": "advanced",
240+
      "trainingConfigVersion": "2024-06-01-preview",
241+
      "evaluationOptions": {
242+
            "kind": "percentage",
243+
            "testingSplitPercentage": 0,
244+
            "trainingSplitPercentage": 100
245+
      }
246+
}
247+
```
248+
249+
Once the request is sent, you can track the progress of the training job in Language Studio as usual.
250+
251+
Caveats:
252+
- The None Score threshold for the app (confidence threshold below which the topIntent is marked as None) when using this recipe should be set to 0. This is because this new recipe attributes a certain portion of the in domain probabiliities to out of domain so that the model is not incorrectly overconfident about in domain utterances. As a result, users may see slightly reduced confidence scores for in domain utterances as compared to the prod recipe.
253+
- This recipe is not recommended for apps with just two (2) intents, such as IntentA and None, for example.
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Explore model benchmarks in Azure AI Studio
3+
titleSuffix: Azure AI Studio
4+
description: This article introduces benchmarking capabilities and the model benchmarks experience in Azure AI Studio.
5+
manager: scottpolly
6+
ms.service: azure-ai-studio
7+
ms.custom:
8+
ms.topic: how-to
9+
ms.date: 5/6/2024
10+
ms.reviewer: jcioffi
11+
ms.author: jcioffi
12+
author: jesscioffi
13+
---
14+
15+
# Model benchmarks
16+
17+
[!INCLUDE [Azure AI Studio preview](../includes/preview-ai-studio.md)]
18+
19+
In Azure AI Studio, you can compare benchmarks across models and datasets available in the industry to assess which one meets your business scenario. You can find Model benchmarks under **Get started** in the left side menu in Azure AI Studio.
20+
21+
:::image type="content" source="../media/explore/model-benchmarks-dashboard-view.png" alt-text="Screenshot of dashboard view graph of model accuracy." lightbox="../media/explore/model-benchmarks-dashboard-view.png":::
22+
23+
Model benchmarks help you make informed decisions about the sustainability of models and datasets prior to initiating any job. The benchmarks are a curated list of the best performing models for a given task, based on a comprehensive comparison of benchmarking metrics. Currently, Azure AI Studio provides benchmarks based on quality, via the metrics listed below.
24+
25+
| Metric | Description |
26+
|--------------|-------|
27+
| Accuracy |Accuracy scores are available at the dataset and the model levels. At the dataset level, the score is the average value of an accuracy metric computed over all examples in the dataset. The accuracy metric used is exact-match in all cases except for the *HumanEval* dataset that uses a `pass@1` metric. Exact match simply compares model generated text with the correct answer according to the dataset, reporting one if the generated text matches the answer exactly and zero otherwise. `Pass@1` measures the proportion of model solutions that pass a set of unit tests in a code generation task. At the model level, the accuracy score is the average of the dataset-level accuracies for each model.|
28+
| Coherence |Coherence evaluates how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language.|
29+
| Fluency |Fluency evaluates the language proficiency of a generative AI's predicted answer. It assesses how well the generated text adheres to grammatical rules, syntactic structures, and appropriate usage of vocabulary, resulting in linguistically correct and natural-sounding responses.|
30+
| GPTSimilarity|GPTSimilarity is a measure that quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model. It is calculated by first computing sentence-level embeddings using the embeddings API for both the ground truth and the model's prediction. These embeddings represent high-dimensional vector representations of the sentences, capturing their semantic meaning and context.|
31+
32+
The benchmarks are updated regularly as new metrics and datasets are added to existing models, and as new models are added to the model catalog.
33+
34+
### How the scores are calculated
35+
36+
The benchmark results originate from public datasets that are commonly used for language model evaluation. In most cases, the data is hosted in GitHub repositories maintained by the creators or curators of the data. Azure AI evaluation pipelines download data from their original sources, extract prompts from each example row, generate model responses, and then compute relevant accuracy metrics.
37+
38+
Prompt construction follows best practice for each dataset, set forth by the paper introducing the dataset and industry standard. In most cases, each prompt contains several examples of complete questions and answers, or "shots," to prime the model for the task. The evaluation pipelines create shots by sampling questions and answers from a portion of the data that is held out from evaluation.
39+
40+
### View options in the model benchmarks
41+
42+
These benchmarks encompass both a dashboard view and a list view of the data for ease of comparison, and helpful information that explains what the calculated metrics mean.
43+
44+
Dashboard view allows you to compare the scores of multiple models across datasets and tasks. You can view models side by side (horizontally along the x-axis) and compare their scores (vertically along the y-axis) for each metric.
45+
46+
You can filter the dashboard view by task, model collection, model name, dataset, and metric.
47+
48+
You can switch from dashboard view to list view by following these quick steps:
49+
1. Select the models you want to compare.
50+
2. Select **List** on the right side of the page.
51+
52+
:::image type="content" source="../media/explore/model-benchmarks-dashboard-filtered.png" alt-text="Screenshot of dashboard view graph with question answering filter applied and 'List' button identified." lightbox="../media/explore/model-benchmarks-dashboard-filtered.png":::
53+
54+
In list view you can find the following information:
55+
- Model name, description, version, and aggregate scores.
56+
- Benchmark datasets (such as AGIEval) and tasks (such as question answering) that were used to evaluate the model.
57+
- Model scores per dataset.
58+
59+
You can also filter the list view by task, model collection, model name, dataset, and metric.
60+
61+
:::image type="content" source="../media/explore/model-benchmarks-list-view.png" alt-text="Screenshot of list view table displaying accuracy metrics in an ordered list." lightbox="../media/explore/model-benchmarks-list-view.png":::
62+
63+
## Next steps
64+
65+
- [Explore Azure AI foundation models in Azure AI Studio](models-foundation-azure-ai.md)
66+
- [View and compare benchmarks in AI Studio](https://ai.azure.com/explore/benchmarks)
221 KB
Loading
215 KB
Loading
259 KB
Loading

articles/ai-studio/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ items:
4141
items:
4242
- name: Model catalog
4343
href: how-to/model-catalog.md
44+
- name: Model benchmarks
45+
href: how-to/model-benchmarks.md
4446
- name: Cohere models
4547
items:
4648
- name: Deploy Cohere Command models

articles/aks/enable-fips-nodes.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,23 @@ The Federal Information Processing Standard (FIPS) 140-2 is a US government stan
3333
>
3434
> FIPS-enabled node images may have different version numbers, such as kernel version, than images that aren't FIPS-enabled. The update cycle for FIPS-enabled node pools and node images may differ from node pools and images that aren't FIPS-enabled.
3535
36+
## Supported OS Versions
37+
You can create FIPS-enabled node pools on all supported OS types, Linux and Windows. However, not all OS versions support FIPS-enabled nodepools. After a new OS version is released, there is typically a waiting period before it is FIPS compliant.
38+
39+
The below table includes the supported OS versions:
40+
41+
|OS Type|OS SKU|FIPS Compliance|
42+
|--|--|--|
43+
|Linux|Ubuntu|Supported|
44+
|Linux|Azure Linux| Supported|
45+
|Windows|Windows Server 2019| Supported|
46+
|Windows| Windows Server 2022| Supported|
47+
48+
When requesting FIPS enabled Ubuntu, if the default Ubuntu version does not support FIPS, AKS will default to the most recent FIPS-supported version of Ubuntu. For example, Ubuntu 22.04 is default for Linux node pools. Since 22.04 does not currently support FIPS, AKS will default to Ubuntu 20.04 for Linux FIPS-enabled nodepools.
49+
50+
> [!NOTE]
51+
> Previously, you could use the GetOSOptions API to determine whether a given OS supported FIPS. The GetOSOptions API is now deprecated and it will no longer be included in new AKS API versions starting with 2024-05-01.
52+
3653
## Create a FIPS-enabled Linux node pool
3754

3855
1. Create a FIPS-enabled Linux node pool using the [`az aks nodepool add`][az-aks-nodepool-add] command with the `--enable-fips-image` parameter.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: 'Deploy an application that uses OpenAI on Azure App Service'
3+
description: Get started with OpenAI on Azure App Service
4+
author: jefmarti
5+
ms.author: jefmarti
6+
ms.date: 04/10/2024
7+
ms.topic: article
8+
zone_pivot_groups: app-service-openai
9+
---
10+
11+
# Deploy an application that uses OpenAI on Azure App Service
12+
13+
::: zone pivot="openai-dotnet"
14+
[!INCLUDE [deploy-intelligent-apps-linux-dotnet-pivot.md](includes/deploy-intelligent-apps/deploy-intelligent-apps-linux-dotnet-pivot.md)]
15+
::: zone-end
16+
17+
::: zone pivot="openai-python"
18+
[!INCLUDE [deploy-intelligent-apps-linux-python-pivot.md](includes/deploy-intelligent-apps/deploy-intelligent-apps-linux-python-pivot.md)]
19+
::: zone-end
20+
21+
:::zone pivot="openai-java"
22+
[!INCLUDE [deploy-intelligent-apps-linux-java-pivot.md](includes/deploy-intelligent-apps/deploy-intelligent-apps-linux-java-pivot.md)]
23+
::: zone-end

0 commit comments

Comments
 (0)