Skip to content

Commit 275a8d5

Browse files
committed
Update deployment overview article
1 parent ed4baa6 commit 275a8d5

File tree

1 file changed

+52
-30
lines changed

1 file changed

+52
-30
lines changed

articles/ai-foundry/concepts/deployments-overview.md

Lines changed: 52 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5,61 +5,83 @@ description: Learn about deploying models in Azure AI Foundry portal.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.topic: concept-article
8-
ms.date: 03/24/2025
8+
ms.date: 06/26/2025
99
ms.reviewer: fasantia
1010
ms.author: mopeakande
1111
author: msakande
1212
---
1313

14-
# Overview: Deploy AI models in Azure AI Foundry portal
14+
# Overview: Deploy AI models in Azure AI Foundry
1515

16-
The model catalog in Azure AI Foundry portal is the hub to discover and use a wide range of models for building generative AI applications. Models need to be deployed to make them available for receiving inference requests. Azure AI Foundry offers a comprehensive suite of deployment options for models, depending on your needs and model requirements.
16+
The model catalog in Azure AI Foundry is the hub to discover and use a wide range of models for building generative AI applications. Models need to be deployed to make them available for receiving inference requests. Azure AI Foundry offers a comprehensive suite of deployment options for models, depending on your needs and model requirements.
1717

18-
## Deploying models
18+
## Deployment options
1919

20-
Deployment options vary depending on the model offering:
20+
Azure AI Foundry provides multiple deployment options depending on the type of resources and models that you need to provision. The following 3 deployment options are available:
2121

22-
* **Azure OpenAI in Azure AI Foundry Models:** The latest OpenAI models that have enterprise features from Azure with flexible billing options.
23-
* **Serverless API deployment:** These models don't require compute quota from your subscription and are billed per token in a serverless API deployment.
24-
* **Open and custom models:** The model catalog offers access to a large variety of models across modalities, including models of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management.
22+
### Standard deployments in Azure AI Foundry resources
2523

26-
Azure AI Foundry offers four different deployment options:
24+
Formerly known Azure AI model inference in Azure AI Services, is **the preferred deployment option** in Azure AI Foundry. It offers the biggest range of options including regional, data zone, or global processing; and standard and provisioned (PTU) options. Flagship models in Azure AI Foundry Models support this deployment option.
2725

28-
|Name | Azure OpenAI | Azure AI Foundry Models | Serverless API deployment | Managed compute |
26+
This deployment option is available in:
27+
28+
* Azure OpenAI resources<sup>1</sup>
29+
* Azure AI Foundry resources (formerly known Azure AI Services)
30+
* Azure AI Hub when connected to an Azure AI Foundry resource (requires the feature [Deploy models to Azure AI Foundry resources](#configure-azure-ai-foundry-portal-for-deployment-options) on).
31+
32+
<sup>1</sup>If you are using Azure OpenAI resources, the model catalog only shows Azure OpenAI models for deployment. You can get the full list of models by upgrading to an Azure AI Foundry resource.
33+
34+
To get started, see [How-to: Deploy models to Azure AI Foundry Models](../model-inference/how-to/create-model-deployments.md).
35+
36+
### Serverless API Endpoint
37+
38+
This option is available **only in Azure AI Hubs resources** and it allows the creation of dedicated endpoints to host the model, accessible via API with pay-as-you-go billing. It's supported by Azure AI Foundry Models with pay-as-you-go billing. Only regional deployments can be created for Serverless API Endpoints. It requires the feature [Deploy models to Azure AI Foundry resources](#configure-azure-ai-foundry-portal-for-deployment-options) **off**.
39+
40+
To get started, see [How-to: Deploy models to Serverless API Endpoints](../model-inference/how-to/create-model-deployments.md)
41+
42+
### Managed Compute
43+
44+
This option is available **only in Azure AI Hubs resources** and it allows the creation of dedicated endpoint to host the model in **dedicated compute**. You need to have compute quota in your subscription to host the model and you are billed per compute up-time.
45+
46+
This option is required for the following model collections:
47+
48+
* Hugging Face
49+
* NVIDIA NIMs
50+
* Industry models (Saifr, Rockwell, Bayer, Cerence, Sight Machine, Page AI, SDAIA)
51+
* Databricks
52+
* Custom models
53+
54+
To get started, see [How-to: Deploy to Managed compute](../how-to/deploy-models-managed.md).
55+
56+
## Features
57+
58+
We recommend using Standard deployments in Azure AI Foundry resources (formerly known Azure AI model inference in Azure AI Services) whenever possible as it offers the larger set of features. The following table shows details about specific features available on each deployment option:
59+
60+
| Feature | Azure OpenAI | Azure AI Foundry | Serverless API Endpoint | Managed compute |
2961
|-------------------------------|----------------------|-------------------|----------------|-----------------|
30-
| Which models can be deployed? | [Azure OpenAI models](../../ai-services/openai/concepts/models.md) | [Azure OpenAI models and serverless API deployment](../../ai-foundry/model-inference/concepts/models.md) | [serverless API deployment](../how-to/model-catalog-overview.md) | [Open and custom models](../how-to/model-catalog-overview.md#availability-of-models-for-deployment-as-managed-compute) |
31-
| Deployment resource | Azure OpenAI resource | Azure AI services resource | AI project resource | AI project resource |
32-
| Requires Hubs/Projects | No | No | Yes | Yes |
33-
| Data processing options | Regional <br /> Data-zone <br /> Global | Global | Regional | Regional |
62+
| Which models can be deployed? | [Azure OpenAI models](../../ai-services/openai/concepts/models.md) | [Azure OpenAI models and Foundry Models with pay-as-you-go billing](../../ai-foundry/model-inference/concepts/models.md) | [Foundry Models with pay-as-you-go billing](../how-to/model-catalog-overview.md) | [Open and custom models](../how-to/model-catalog-overview.md#availability-of-models-for-deployment-as-managed-compute) |
63+
| Deployment resource | Azure OpenAI resource | Azure AI Foundry resource (formerly known Azure AI Services) | AI project (in AI Hub resource) | AI project (in AI Hub resource) |
64+
| Requires AI Hubs | No | No | Yes | Yes |
65+
| Data processing options | Regional <br /> Data-zone <br /> Global | Regional <br /> Data-zone <br /> Global | Regional | Regional |
3466
| Private networking | Yes | Yes | Yes | Yes |
3567
| Content filtering | Yes | Yes | Yes | No |
3668
| Custom content filtering | Yes | Yes | No | No |
3769
| Key-less authentication | Yes | Yes | No | No |
38-
| Best suited when | You're planning to use only OpenAI models | You're planning to take advantage of the flagship models in Azure AI catalog, including OpenAI. | You're planning to use a single model from a specific provider (excluding OpenAI). | If you plan to use open models and you have enough compute quota available in your subscription. |
3970
| Billing bases | Token usage & [provisioned throughput units](../../ai-services/openai/concepts/provisioned-throughput.md) | Token usage | Token usage<sup>1</sup> | Compute core hours<sup>2</sup> |
40-
| Deployment instructions | [Deploy to Azure OpenAI](../how-to/deploy-models-openai.md) | [Deploy to Foundry Models](../model-inference/how-to/create-model-deployments.md) | [Deploy to serverless API deployment](../how-to/deploy-models-serverless.md) | [Deploy to Managed compute](../how-to/deploy-models-managed.md) |
4171

42-
<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in serverless API deployment. After you delete the endpoint, no further charges accrue.
72+
<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in standard deployment. After you delete the endpoint, no further charges accrue.
4373

4474
<sup>2</sup> Billing is on a per-minute basis, depending on the product tier and the number of instances used in the deployment since the moment of creation. After you delete the endpoint, no further charges accrue.
4575

46-
> [!TIP]
47-
> To learn more about how to track costs, see [Monitor costs for models offered through Azure Marketplace](../how-to/costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace).
48-
49-
### How should I think about deployment options?
50-
51-
Azure AI Foundry encourages you to explore various deployment options and choose the one that best suites your business and technical needs. In general, Consider using the following approach to select a deployment option:
52-
53-
* Start with [Foundry Models](../../ai-foundry/model-inference/overview.md), which is the option with the largest scope. This option allows you to iterate and prototype faster in your application without having to rebuild your architecture each time you decide to change something. If you're using Azure AI Foundry hubs or projects, enable this option by [turning on the Foundry Models feature](../model-inference/how-to/quickstart-ai-project.md#configure-the-project-to-use-foundry-models).
54-
55-
* When you're looking to use a specific model:
76+
## Configure Azure AI Foundry portal for deployment options
5677

57-
* If you're interested in Azure OpenAI models, use Azure OpenAI in Foundry Models. This option is designed for Azure OpenAI models and offers a wide range of capabilities for them.
78+
Azure AI Foundry portal may automatically pick up a deployment option based on your environment and configuration. When possible, we default to the most convenient deployment option available to you.
5879

59-
* If you're interested in a particular model from serverless pay per token offer, and you don't expect to use any other type of model, use [serverless API deployment](../how-to/deploy-models-serverless.md). serverless API deployments allow deployment of a single model under a unique set of endpoint URL and keys.
80+
We recommend using Azure AI Foundry resources (formerly known Azure AI Services) for deployment whenever possible. To do that, ensure you have the feature **Deploy models to Azure AI Foundry resources** on.
6081

61-
* When your model isn't available in serverless API deployment and you have compute quota available in your subscription, use [Managed Compute](../how-to/deploy-models-managed.md), which supports deployment of open and custom models. It also allows a high level of customization of the deployment inference server, protocols, and detailed configuration.
82+
:::image type="content" source="../model-inference/media/models/docs-flag-enable-foundry.gif" alt-text="An animation showing how to enable deployment to Azure AI Foundry resources (formerly known Azure AI Services)." lightbox="../model-inference/media/models/docs-flag-enable-foundry.gif":::
6283

84+
Notice that once enabled, models that support multiple deployment options will default to deploy to Azure AI Foundry resources for deployment. To access other deployment options, either disable the feature or use the Azure CLI or Azure Machine Learning SDK for deployment. You can disable and enable the feature as many times as needed. Existing deployments won't be affected.
6385

6486
## Related content
6587

0 commit comments

Comments
 (0)