Skip to content

Commit bfc7606

Browse files
committed
Merge branch 'main' of github.com:MicrosoftDocs/azure-ai-docs-pr into 364910-foundry-policy
2 parents 4c92a80 + f3ad5d1 commit bfc7606

File tree

70 files changed

+756
-278
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+756
-278
lines changed

articles/ai-foundry/model-inference/concepts/deployment-types.md

Lines changed: 18 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -13,43 +13,36 @@ ms.custom: ignite-2024, github-universe-2024
1313

1414
# Deployment types in Azure AI model inference
1515

16-
Azure AI model inference in Azure AI services provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
16+
Azure AI model inference makes models available using the *model deployment* concept in Azure AI Services resources. *Model deployments* are also Azure resources and, when created, they give access to a given model under certain configurations. Such configuration includes the infrastructure require to process the requests.
1717

18-
All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you need to make two key decisions:
18+
Azure AI model inference provides customers with choices on the hosting structure that fits their business and usage patterns. Those options are translated to different deployments types (or SKUs) that are available at model deployment time in the Azure AI Services resource.
1919

20-
- **Data residency needs**: global vs. regional resources
21-
- **Call volume**: standard vs. provisioned
20+
:::image type="content" source="../media/add-model-deployments/models-deploy-deployment-type.png" alt-text="Screenshot showing how to customize the deployment type for a given model deployment." lightbox="../media/add-model-deployments/models-deploy-deployment-type.png":::
2221

23-
Deployment types support varies by model and model provider. You can see which deployment type (SKU) each model supports in the [Models section](models.md).
22+
Different model providers offer different deployments SKUs that you can select from. When selecting a deployment type, consider your **data residency needs** and **call volume/capacity** requirements.
2423

25-
## Global versus regional deployment types
24+
## Deployment types for Azure OpenAI models
2625

27-
For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point.
26+
The service offers two main types of deployments: **standard** and **provisioned**. For a given deployment type, customers can align their workloads with their data processing requirements by choosing an Azure geography (`Standard` or `Provisioned-Managed`), Microsoft specified data zone (`DataZone-Standard` or `DataZone Provisioned-Managed`), or Global (`Global-Standard` or `Global Provisioned-Managed`) processing options.
2827

29-
Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer's inference requests. This means you get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
28+
To learn more about deployment options for Azure OpenAI models see [Azure OpenAI documentation](../../../ai-services/openai/how-to/deployment-types.md).
3029

31-
Our global deployments are the first location for all new models and features. Customers with large throughput requirements should consider our provisioned deployment offering.
30+
## Deployment types for Models-as-a-Service models
3231

33-
## Standard
32+
Models from third-party model providers with pay-as-you-go billing (collectively called Models-as-a-Service), makes models available in Azure AI model inference under **standard** deployments with a Global processing option (`Global-Standard`).
3433

35-
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region and throughput may be limited.
34+
### Global-Standard
3635

37-
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
36+
Global deployments leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources. Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure location. Learn more about [data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
3837

39-
Only Azure OpenAI models support this deployment type.
38+
> [!NOTE]
39+
> Models-as-a-Service offers regional deployment options under [Serverless API endpoints](../../../ai-studio/how-to/deploy-models-serverless.md) in Azure AI Foundry. Prompts and outputs are processed within the geography specified during deployment. However, those deployments can't be accessed using the Azure AI model inference endpoint in Azure AI Services.
4040
41-
## Global standard
41+
## Control deployment options
4242

43-
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
43+
Administrators can control which model deployment types are available to their users by using Azure Policies. Learn more about [How to control AI model deployment with custom policies](../../../ai-studio/how-to/custom-policy-model-deployment.md).
4444

45-
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput if available.
45+
## Related content
4646

47-
## Global provisioned
48-
49-
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
50-
51-
Only Azure OpenAI models support this deployment type.
52-
53-
## Next steps
54-
55-
- [Quotas & limits](../quotas-limits.md)
47+
- [Quotas & limits](../quotas-limits.md)
48+
- [Data privacy, and security for Models-as-a-Service models](../../../ai-studio/how-to/concept-data-privacy.md)

articles/ai-foundry/model-inference/concepts/endpoints.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,11 @@ To learn more about how to create deployments see [Add and configure model deplo
3838

3939
## Azure AI inference endpoint
4040

41-
The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md) which all the models in Azure AI model inference support.
41+
The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md) which all the models in Azure AI model inference support. It support the following modalidities:
42+
43+
* Text embeddings
44+
* Image embeddings
45+
* Chat completions
4246

4347
You can see the endpoint URL and credentials in the **Overview** section:
4448

@@ -84,4 +88,4 @@ The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)
8488
## Next steps
8589

8690
- [Models](models.md)
87-
- [Deployment types](deployment-types.md)
91+
- [Deployment types](deployment-types.md)

articles/ai-foundry/model-inference/faq.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ metadata:
55
description: Get answers to the most popular questions about Azure AI model inference
66
#services: cognitive-services
77
manager: nitinme
8-
ms.service: azure-ai-models
8+
ms.service: azure-ai-model-inference
99
ms.topic: faq
1010
ms.date: 1/21/2025
1111
ms.author: fasantia
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: Configure key-less authentication with Microsoft Entra ID
3+
titleSuffix: Azure AI Foundry
4+
description: Learn how to configure key-less authorization to use Azure AI model inference with Microsoft Entra ID.
5+
ms.service: azure-ai-model-inference
6+
ms.topic: how-to
7+
ms.date: 10/01/2024
8+
ms.custom: ignite-2024, github-universe-2024
9+
manager: nitinme
10+
author: mrbullwinkle
11+
ms.author: fasantia
12+
recommendations: false
13+
zone_pivot_groups: azure-ai-models-deployment
14+
---
15+
16+
# Configure key-less authentication with Microsoft Entra ID
17+
18+
::: zone pivot="ai-foundry-portal"
19+
[!INCLUDE [portal](../includes/configure-entra-id/portal.md)]
20+
::: zone-end
21+
22+
::: zone pivot="programming-language-cli"
23+
[!INCLUDE [cli](../includes/configure-entra-id/cli.md)]
24+
::: zone-end
25+
26+
::: zone pivot="programming-language-bicep"
27+
[!INCLUDE [bicep](../includes/configure-entra-id/bicep.md)]
28+
::: zone-end
29+
30+
## Next steps
31+
32+
* [Develop applications using Azure AI model inference service in Azure AI services](../supported-languages.md)

articles/ai-foundry/model-inference/how-to/inference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,9 @@ Azure AI services expose multiple endpoints depending on the type of work you're
2626
> * Azure AI model inference endpoint
2727
> * Azure OpenAI endpoint
2828
29-
The **Azure AI inference endpoint** allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md).
29+
The **Azure AI inference endpoint** (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md).
3030

31-
**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API. This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
31+
**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API (usually with the form `https://<resource-name>.openai.azure.com`). This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
3232

3333
To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI service documentation](../../../ai-services/openai/overview.md).
3434

articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,17 @@ recommendations: false
1414

1515
# Configure your AI project to use Azure AI model inference
1616

17-
If you already have an AI project in an existing AI Hub, models via "Models as a Service" are by default deployed inside of your project as stand-alone endpoints. Each model deployment has its own set of URI and credentials to access it. Azure OpenAI models are deployed to Azure AI Services resource or to the Azure OpenAI Service resource.
17+
If you already have an AI project in Azure AI Foundry, the model catalog deploys models from third-party model providers as stand-alone endpoints in your project by default. Each model deployment has its own set of URI and credentials to access it. On the other hand, Azure OpenAI models are deployed to Azure AI Services resource or to the Azure OpenAI Service resource.
1818

19-
You can configure the AI project to connect with the Azure AI model inference in Azure AI services. Once configured, **deployments of Models as a Service models happen to the connected Azure AI Services resource** instead to the project itself, giving you a single set of endpoint and credential to access all the models deployed in Azure AI Foundry.
19+
You can change this behavior and deploy both types of models to Azure AI Services resources using Azure AI model inference. Once configured, **deployments of Models as a Service models supporting pay-as-you-go billing happen to the connected Azure AI Services resource** instead to the project itself, giving you a single set of endpoint and credential to access all the models deployed in Azure AI Foundry. You can manage Azure OpenAI and third-party model providers models in the same way.
2020

2121
Additionally, deploying models to Azure AI model inference brings the extra benefits of:
2222

2323
> [!div class="checklist"]
24-
> * [Routing capability](../concepts/endpoints.md#routing)
25-
> * [Custom content filters](../concepts/content-filter.md)
26-
> * Global capacity deployment
27-
> * Entra ID support and role-based access control
24+
> * [Routing capability](../concepts/endpoints.md#routing).
25+
> * [Custom content filters](../concepts/content-filter.md).
26+
> * Global capacity deployment type.
27+
> * [Key-less authentication](configure-entra-id.md) with role-based access control.
2828
2929
In this article, you learn how to configure your project to use models deployed in Azure AI model inference in Azure AI services.
3030

@@ -104,7 +104,7 @@ For each model you want to deploy under Azure AI model inference, follow these s
104104

105105
6. You can configure the deployment settings at this time. By default, the deployment receives the name of the model you're deploying. The deployment name is used in the `model` parameter for request to route to this particular model deployment. It allows you to configure specific names for your models when you attach specific configurations. For instance, `o1-preview-safe` for a model with a strict content safety content filter.
106106

107-
7. We automatically select an Azure AI Services connection depending on your project because you have turned on the feature **Deploy models to Azure AI model inference service**. Use the **Customize** option to change the connection based on your needs. If you're deploying under the **Standard** deployment type, the models need to be available in the region of the Azure AI Services resource.
107+
7. We automatically select an Azure AI Services connection depending on your project because you turned on the feature **Deploy models to Azure AI model inference service**. Use the **Customize** option to change the connection based on your needs. If you're deploying under the **Standard** deployment type, the models need to be available in the region of the Azure AI Services resource.
108108

109109
:::image type="content" source="../media/add-model-deployments/models-deploy-customize.png" alt-text="Screenshot showing how to customize the deployment if needed." lightbox="../media/add-model-deployments/models-deploy-customize.png":::
110110

@@ -152,7 +152,7 @@ Although you configured the project to use the Azure AI model inference, existin
152152

153153
### Upgrade your code with the new endpoint
154154

155-
Once the models are deployed under Azure AI Services, you can upgrade your code to use the Azure AI model inference endpoint. The main difference between how Serverless API endpoints and Azure AI model inference works reside in the endpoint URL and model parameter. While Serverless API Endpoints have set of URI and key per each model deployment, Azure AI model inference has only one for all of them.
155+
Once the models are deployed under Azure AI Services, you can upgrade your code to use the Azure AI model inference endpoint. The main difference between how Serverless API endpoints and Azure AI model inference works reside in the endpoint URL and model parameter. While Serverless API Endpoints have a set of URI and key per each model deployment, Azure AI model inference has only one for all of them.
156156

157157
The following table summarizes the changes you have to introduce:
158158

@@ -186,10 +186,11 @@ For each model deployed as Serverless API Endpoints, follow these steps:
186186

187187
## Limitations
188188

189-
Azure AI model inference in Azure AI Services gives users access to flagship models in the Azure AI model catalog. However, only models supporting pay-as-you-go billing (Models as a Service) are available for deployment.
189+
Consider the following limitations when configuring your project to use Azure AI model inference:
190190

191-
Models requiring compute quota from your subscription (Managed Compute), including custom models, can only be deployed within a given project as Managed Online Endpoints and continue to be accessible using their own set of endpoint URI and credentials.
191+
* Only models supporting pay-as-you-go billing (Models as a Service) are available for deployment to Azure AI model inference. Models requiring compute quota from your subscription (Managed Compute), including custom models, can only be deployed within a given project as Managed Online Endpoints and continue to be accessible using their own set of endpoint URI and credentials.
192+
* Models available as both pay-as-you-go billing and managed compute offerings are, by default, deployed to Azure AI model inference in Azure AI services resources. Azure AI Foundry portal doesn't offer a way to deploy them to Managed Online Endpoints. You have to turn off the feature mentioned at [Configure the project to use Azure AI model inference](#configure-the-project-to-use-azure-ai-model-inference) or use the Azure CLI/Azure ML SDK/ARM templates to perform the deployment.
192193

193194
## Next steps
194195

195-
* [Add more models](create-model-deployments.md) to your endpoint.
196+
* [Add more models](create-model-deployments.md) to your endpoint.

0 commit comments

Comments
 (0)