Skip to content

Commit 5585e76

Browse files
authored
Merge pull request #2425 from eric-urban/eur/model-inference-PR-2
new articles to the release branch
2 parents 7daf825 + 19fdeec commit 5585e76

32 files changed

+1263
-618
lines changed

articles/ai-foundry/model-inference/breadcrumb/toc.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@
66
tocHref: /azure/ai-services/
77
topicHref: /azure/ai-services/index
88
items:
9-
- name: Azure AI models in Azure AI Services
10-
tocHref: /azure/ai-services/
11-
topicHref: /azure/ai-services/model-inference/index
9+
- name: Azure AI Model Inference
10+
tocHref: /azure/ai-foundry/
11+
topicHref: /azure/ai-foundry/model-inference/index

articles/ai-foundry/model-inference/concepts/content-filter.md

Lines changed: 309 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: Default content safety policies for Azure AI Model Inference
3+
titleSuffix: Azure AI Foundry
4+
description: Learn about the default content safety policies that Azure AI Model Inference uses to flag content.
5+
author: PatrickFarley
6+
ms.author: fasantia
7+
ms.service: azure-ai-model-inference
8+
ms.topic: conceptual
9+
ms.date: 1/21/2025
10+
manager: nitinme
11+
---
12+
13+
# Default content safety policies for Azure AI Model Inference
14+
15+
Azure AI model inference includes default safety applied to all models, excluding Azure OpenAI Whisper. These configurations provide you with a responsible experience by default.
16+
17+
Default safety aims to mitigate risks such as hate and fairness, sexual, violence, self-harm, protected material content, and user prompt injection attacks. To learn more about content filtering, read [our documentation describing categories and severity levels](content-filter.md).
18+
19+
This document describes the default configuration.
20+
21+
> [!TIP]
22+
> By default, all model deployments use the default configuration. However, you can configure content filtering per model deployment as explained at [Configuring content filtering](../how-to/configure-content-filters.md).
23+
24+
## Text models
25+
26+
Text models in Azure AI model inference can take in and generate both text and code. These models apply Azure's text content filtering models to detect and prevent harmful content. This system works on both prompt and completion.
27+
28+
| Risk Category | Prompt/Completion | Severity Threshold |
29+
|-------------------------------------------|------------------------|---------------------|
30+
| Hate and Fairness | Prompts and Completions| Medium |
31+
| Violence | Prompts and Completions| Medium |
32+
| Sexual | Prompts and Completions| Medium |
33+
| Self-Harm | Prompts and Completions| Medium |
34+
| User prompt injection attack (Jailbreak) | Prompts | N/A |
35+
| Protected Material – Text | Completions | N/A |
36+
| Protected Material – Code | Completions | N/A |
37+
38+
## Vision and chat with vision models
39+
40+
Vision models can take both text and images at the same time as part of the input. Default content filtering capabilities vary per model and provider.
41+
42+
### Azure OpenAI: GPT-4o and GPT-4 Turbo
43+
44+
| Risk Category | Prompt/Completion | Severity Threshold |
45+
|---------------------------------------------------------------------|------------------------|---------------------|
46+
| Hate and Fairness | Prompts and Completions| Medium |
47+
| Violence | Prompts and Completions| Medium |
48+
| Sexual | Prompts and Completions| Medium |
49+
| Self-Harm | Prompts and Completions| Medium |
50+
| Identification of Individuals and Inference of Sensitive Attributes | Prompts | N/A |
51+
| User prompt injection attack (Jailbreak) | Prompts | N/A |
52+
53+
### Azure OpenAI: DALL-E 3 and DALL-E 2
54+
55+
| Risk Category | Prompt/Completion | Severity Threshold |
56+
|---------------------------------------------------|------------------------|---------------------|
57+
| Hate and Fairness | Prompts and Completions| Low |
58+
| Violence | Prompts and Completions| Low |
59+
| Sexual | Prompts and Completions| Low |
60+
| Self-Harm | Prompts and Completions| Low |
61+
| Content Credentials | Completions | N/A |
62+
| Deceptive Generation of Political Candidates | Prompts | N/A |
63+
| Depictions of Public Figures | Prompts | N/A |
64+
| User prompt injection attack (Jailbreak) | Prompts | N/A |
65+
| Protected Material – Art and Studio Characters | Prompts | N/A |
66+
| Profanity | Prompts | N/A |
67+
68+
69+
In addition to the previous safety configurations, Azure OpenAI DALL-E also comes with [prompt transformation](../../../ai-services/openai/concepts/prompt-transformation.md) by default. This transformation occurs on all prompts to enhance the safety of your original prompt, specifically in the risk categories of diversity, deceptive generation of political candidates, depictions of public figures, protected material, and others.
70+
71+
### Meta: Llama-3.2-11B-Vision-Instruct and Llama-3.2-90B-Vision-Instruct
72+
73+
Content filters apply only to text prompts and completions. Images aren't subject to content moderation.
74+
75+
### Microsoft: Phi-3.5-vision-instruct
76+
77+
Content filters apply only to text prompts and completions. Images aren't subject to content moderation.
78+
79+
## Next steps
80+
81+
* [Configure content filters in Azure AI Model Inference](../how-to/configure-content-filters.md)
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
title: Understanding deployment types in Azure AI model inference
3+
titleSuffix: Azure AI Foundry
4+
description: Learn how to use deployment types in Azure AI model deployments
5+
author: mrbullwinkle
6+
manager: nitinme
7+
ms.service: azure-ai-model-inference
8+
ms.topic: how-to
9+
ms.date: 1/21/2025
10+
ms.author: fasantia
11+
ms.custom: ignite-2024, github-universe-2024
12+
---
13+
14+
# Deployment types in Azure AI model inference
15+
16+
Azure AI model inference in Azure AI services provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
17+
18+
All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you need to make two key decisions:
19+
20+
- **Data residency needs**: global vs. regional resources
21+
- **Call volume**: standard vs. provisioned
22+
23+
Deployment types support varies by model and model provider. You can see which deployment type (SKU) each model supports in the [Models section](models.md).
24+
25+
## Global versus regional deployment types
26+
27+
For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point.
28+
29+
Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer's inference requests. This means you get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
30+
31+
Our global deployments are the first location for all new models and features. Customers with large throughput requirements should consider our provisioned deployment offering.
32+
33+
## Standard
34+
35+
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region and throughput may be limited.
36+
37+
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
38+
39+
Only Azure OpenAI models support this deployment type.
40+
41+
## Global standard
42+
43+
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
44+
45+
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput if available.
46+
47+
## Global provisioned
48+
49+
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
50+
51+
Only Azure OpenAI models support this deployment type.
52+
53+
## Next steps
54+
55+
- [Quotas & limits](../quotas-limits.md)
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
title: Model inference endpoint in Azure AI services
3+
titleSuffix: Azure AI Foundry
4+
description: Learn about the model inference endpoint in Azure AI services
5+
author: mrbullwinkle
6+
manager: nitinme
7+
ms.service: azure-ai-model-inference
8+
ms.topic: how-to
9+
ms.date: 1/21/2025
10+
ms.author: fasantia
11+
ms.custom: ignite-2024, github-universe-2024
12+
---
13+
14+
# Model inference endpoint in Azure AI Services
15+
16+
Azure AI model inference in Azure AI services allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
17+
18+
The article explains how models are organized inside of the service and how to use the inference endpoint to invoke them.
19+
20+
## Deployments
21+
22+
Azure AI model inference makes models available using the **deployment** concept. **Deployments** are a way to give a model a name under certain configurations. Then, you can invoke such model configuration by indicating its name on your requests.
23+
24+
Deployments capture:
25+
26+
> [!div class="checklist"]
27+
> * A model name
28+
> * A model version
29+
> * A provisioning/capacity type<sup>1</sup>
30+
> * A content filtering configuration<sup>1</sup>
31+
> * A rate limiting configuration<sup>1</sup>
32+
33+
<sup>1</sup> Configurations may vary depending on the selected model.
34+
35+
An Azure AI services resource can have as many model deployments as needed and they don't incur in cost unless inference is performed for those models. Deployments are Azure resources and hence they're subject to Azure policies.
36+
37+
To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md).
38+
39+
## Azure AI inference endpoint
40+
41+
The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md) which all the models in Azure AI model inference support.
42+
43+
You can see the endpoint URL and credentials in the **Overview** section:
44+
45+
:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
46+
47+
### Routing
48+
49+
The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
50+
51+
:::image type="content" source="../media/endpoint/endpoint-routing.png" alt-text="An illustration showing how routing works for a Meta-llama-3.2-8b-instruct model by indicating such name in the parameter 'model' inside of the payload request." lightbox="../media/endpoint/endpoint-routing.png":::
52+
53+
For example, if you create a deployment named `Mistral-large`, then such deployment can be invoked as:
54+
55+
[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
56+
57+
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
58+
59+
> [!TIP]
60+
> Deployment routing isn't case sensitive.
61+
62+
### SDKs
63+
64+
The Azure AI model inference endpoint is supported by multiple SDKs, including the **Azure AI Inference SDK**, the **Azure AI Foundry SDK**, and the **Azure OpenAI SDK**; which are available in multiple languages. Multiple integrations are also supported in popular frameworks like LangChain, LangGraph, Llama-Index, Semantic Kernel, and AG2. See [supported programming languages and SDKs](../supported-languages.md) for details.
65+
66+
## Azure OpenAI inference endpoint
67+
68+
Azure OpenAI models deployed to AI services also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference.
69+
70+
Azure OpenAI inference endpoints work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
71+
72+
:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
73+
74+
Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
75+
76+
> [!IMPORTANT]
77+
> There's no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment.
78+
79+
### SDKs
80+
81+
The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages. See [supported languages](../supported-languages.md#azure-openai-models) for details.
82+
83+
84+
## Next steps
85+
86+
- [Models](models.md)
87+
- [Deployment types](deployment-types.md)
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Model versions in Azure AI model inference
3+
titleSuffix: Azure AI Foundry
4+
description: Learn about model versions in Azure AI model inference.
5+
ms.service: azure-ai-model-inference
6+
ms.topic: conceptual
7+
ms.custom: ignite-2024, github-universe-2024
8+
ms.date: 1/21/2025
9+
manager: nitinme
10+
author: santiagxf
11+
ms.author: fasantia
12+
recommendations: false
13+
---
14+
15+
# Model versions in Azure AI model inference
16+
17+
Azure AI services are committed to providing the best generative AI models for customers. As part of this commitment, Azure AI services regularly releases new model versions to incorporate the latest features and improvements from key model providers in the industry.
18+
19+
## How model versions work
20+
21+
We want to make it easy for customers to stay up to date as models improve. Customers can choose to start with a particular version and stay on it or to automatically update as new versions are released.
22+
23+
We distinguish two different versions when working with models:
24+
25+
* The version of the model itself.
26+
* The version of the API used to consume a model deployment.
27+
28+
The version of a model is decided when you deploy it. You can choose an update policy, which can include the following options:
29+
30+
* Deployments set with a specific version or without offering an upgrade policy require a manual upgrade if a new version is released. When the model is retired, those deployments stop working.
31+
32+
* Deployments set to **Auto-update to default** automatically update to use the new default version.
33+
34+
* Deployments set to **Upgrade when expired** automatically update when its current version is retired.
35+
36+
> [!NOTE]
37+
> Update policies are configured per deployment and **vary** by model and provider.
38+
39+
The API version indicated the contract that you use to interface with the model in code. When using REST APIs, you indicate the API version using the query parameter `api-version`. Azure SDKs versions are usually paired with specific APIs versions but you can indicate the API version you want to use. A given model deployment might support multiple API versions. The release of a new model version might not require you to upgrade to a new API version, as is the case when there's an update to the model's weights.
40+
41+
## Azure OpenAI model updates
42+
43+
Azure works closely with OpenAI to release new model versions. When a new version of a model is released, you can immediately test it in new deployments. Azure publishes when new versions of models are released, and notifies customers at least two weeks before a new version becomes the default version of the model. Azure also maintains the previous major version of the model until its retirement date, so you can switch back to it if desired.
44+
45+
### What you need to know about Azure OpenAI model version upgrades
46+
47+
As a customer of Azure OpenAI models, you might notice some changes in the model behavior and compatibility after a version upgrade. These changes might affect your applications and workflows that rely on the models. Here are some tips to help you prepare for version upgrades and minimize the impact:
48+
49+
* Read [what's new](../../../ai-services/openai/whats-new.md) and [models](../../../ai-services/openai/concepts/models.md) to understand the changes and new features.
50+
* Read the documentation on [model deployments](../../../ai-services/openai/how-to/create-resource.md) and [version upgrades](../../../ai-services/openai/how-to/working-with-models.md) to understand how to work with model versions.
51+
* Test your applications and workflows with the new model version after release.
52+
* Update your code and configuration to use the new features and capabilities of the new model version.
53+
54+
## Non-Microsoft model updates
55+
56+
Azure works closely with model providers to release new model versions. When a new version of a model is released, you can immediately test it in new deployments. Azure also maintains the previous major version of the model until its retirement date, so you can switch back to it if desired.
57+
58+
New model versions might result in a new model ID being published. For example, `Llama-3.3-70B-Instruct`, `Meta-Llama-3.1-70B-Instruct`, and `Meta-Llama-3-70B-Instruct`. In some cases, all the model versions might be available in the same API version. In other cases, you might also need to adjust the API version used to consume the model in case the API contract has changed from one model to another.
59+
60+
## Related content
61+
62+
- [Learn more about working with Azure OpenAI models](../../../ai-services/openai/how-to/working-with-models.md)

0 commit comments

Comments
 (0)