Skip to content

Commit d94dc9a

Browse files
authored
Merge pull request #5680 from msakande/nvidia-models-in-maap-paygo-doc
nvidia models in maap paygo
2 parents 2a02a79 + 712720b commit d94dc9a

File tree

4 files changed

+100
-148
lines changed

4 files changed

+100
-148
lines changed

articles/ai-foundry/.openpublishing.redirection.ai-studio.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,11 @@
350350
"redirect_url": "/azure/ai-foundry/how-to/evaluate-results",
351351
"redirect_document_id": true
352352
},
353+
{
354+
"source_path_from_root": "/articles/ai-foundry/how-to/deploy-nvidia-inference-microservice.md",
355+
"redirect_url": "/azure/ai-foundry/how-to/deploy-models-managed-pay-go#nvidia",
356+
"redirect_document_id": true
357+
},
353358
{
354359
"source_path_from_root": "/articles/ai-studio/how-to/fine-tune-managed-compute.md",
355360
"redirect_url": "/azure/ai-foundry/how-to/fine-tune-managed-compute",

articles/ai-foundry/how-to/deploy-models-managed-pay-go.md

Lines changed: 95 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom:
88
ms.topic: how-to
9-
ms.date: 06/23/2025
9+
ms.date: 06/24/2025
1010
ms.reviewer: tinaem
1111
reviewer: tinaem
1212
ms.author: mopeakande
1313
author: msakande
1414
---
1515

16-
# Deploy Azure AI Foundry Models with pay-as-you-go billing to managed compute
16+
# Deploy Azure AI Foundry Models to managed compute with pay-as-you-go billing
1717

1818
Azure AI Foundry Models include a comprehensive catalog of models organized into two categories—Models sold directly by Azure, and [Models from partners and community](../concepts/foundry-models-overview.md#models-from-partners-and-community). These models from partners and community, which are available for deployment on a managed compute, are either open or protected models. In this article, you learn how to use protected models from partners and community, offered via Azure Marketplace for deployment on managed compute with pay-as-you-go billing.
1919

@@ -54,10 +54,13 @@ Azure AI Foundry enables a seamless subscription and transaction experience for
5454
- Per-hour Azure Machine Learning compute billing for the virtual machines employed in the deployment.
5555
- Surcharge billing for the model as set by the model publisher on the Azure Marketplace offer.
5656

57-
Pay-as-you-go billing of Azure compute and model surcharge are pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.
57+
Pay-as-you-go billing of Azure compute and model surcharge is pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.
5858

5959
A user's subscription to Azure Marketplace offers are scoped to the project resource within Azure AI Foundry. If a subscription to the Azure Marketplace offer for a particular model already exists within the project, the user is informed in the deployment wizard that the subscription already exists for the project.
6060

61+
> [!NOTE]
62+
> For [NVIDIA inference microservices (NIM)](#nvidia), multiple models are associated with a single marketplace offer, so you only have to subscribe to the NIM offer once within a project to be able to deploy all NIMs offered by NVIDIA in the AI Foundry model catalog. If you want to deploy NIMs in a different project with no existing SaaS subscription, you need to resubscribe to the offer.
63+
6164
To find all the SaaS subscriptions that exist in an Azure subscription:
6265

6366
1. Sign in to the [Azure portal](https://portal.azure.com) and go to your Azure subscription.
@@ -72,11 +75,13 @@ The consumption-based surcharge is accrued to the associated SaaS subscription a
7275

7376
## Subscribe and deploy on managed compute
7477

78+
[!INCLUDE [tip-left-pane](../includes/tip-left-pane.md)]
79+
7580
1. Sign in to [Azure AI Foundry](https://ai.azure.com/?cid=learnDocs).
7681
1. If you're not already in your project, select it.
7782
1. Select **Model catalog** from the left pane.
7883
1. Select the **Deployment options** filter in the model catalog and choose **Managed compute**.
79-
1. Filter the list further by selecting the **Collection** and model of your choice. In this article, we use **Cohere Command A** from the [list of supported models](#supported-models-for-managed-compute-deployment-with-pay-as-you-go-billing) for illustration.
84+
1. Filter the list further by selecting the **Collection** and model of your choice. In this article, we use **Cohere Command A** from the [list of supported models](#supported-models) for illustration.
8085
1. From the model's page, select **Use this model** to open the deployment wizard.
8186
1. Choose from one of the supported VM SKUs for the model. You need to have Azure Machine Learning Compute quota for that SKU in your Azure subscription.
8287
1. Select **Customize** to specify your deployment configuration for parameters such as the instance count. You can also select an existing endpoint for the deployment or create a new one. For this example, we specify an instance count of **1** and create a new endpoint for the deployment.
@@ -90,6 +95,15 @@ The consumption-based surcharge is accrued to the associated SaaS subscription a
9095

9196
1. Select the checkbox to acknowledge that you understand and agree to the terms of use. Then, select **Deploy**. Azure AI Foundry creates the user's subscription to the marketplace offer and then creates the deployment of the model on a managed compute. It takes about 15-20 minutes for the deployment to complete.
9297

98+
## Consume deployments
99+
100+
After your deployment is successfully created, you can follow these steps to consume it:
101+
102+
1. Select **Models + Endpoints** under _My assets_ in your Azure AI Foundry project.
103+
1. Select your deployment from the **Model deployments** tab.
104+
1. Navigate to the **Test** tab for sample inference to the endpoint.
105+
1. Return to the **Details** tab and select **Open in Playground** to go to the chat playground and modify parameters for the inference requests.
106+
93107
## Network isolation of deployments
94108

95109
Collections in the model catalog can be deployed within your isolated networks using workspace managed virtual network. For more information on how to configure your workspace managed networks, see [Configure a managed virtual network to allow internet outbound](../../machine-learning/how-to-managed-network.md#configure-a-managed-virtual-network-to-allow-internet-outbound).
@@ -98,28 +112,83 @@ Collections in the model catalog can be deployed within your isolated networks u
98112

99113
An Azure AI Foundry project with ingress Public Network Access disabled can only support a single active deployment of one of the protected models from the catalog. Attempts to create more active deployments result in deployment creation failures.
100114

101-
## Supported models for managed compute deployment with pay-as-you-go billing
102-
103-
| Collection | Model | Task |
104-
|--|--|--|
105-
| Paige AI | [Virchow2G](https://ai.azure.com/explore/models/Virchow2G/version/1/registry/azureml-paige) | Image Feature Extraction |
106-
| Paige AI | [Virchow2G-Mini](https://ai.azure.com/explore/models/Virchow2G-Mini/version/1/registry/azureml-paige) | Image Feature Extraction |
107-
| Cohere | [Command A](https://ai.azure.com/explore/models/cohere-command-a/version/3/registry/azureml-cohere) | Chat completion |
108-
| Cohere | [Embed v4](https://ai.azure.com/explore/models/embed-v-4-0/version/4/registry/azureml-cohere) | Embeddings |
109-
| Cohere | [Rerank v3.5](https://ai.azure.com/explore/models/Cohere-rerank-v3.5/version/2/registry/azureml-cohere) | Text classification |
110-
| NVIDIA | [Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
111-
| NVIDIA | [Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
112-
| NVIDIA | [Deepseek-R1-Distill-Llama-8B-NIM-microservice](https://ai.azure.com/explore/models/Deepseek-R1-Distill-Llama-8B-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
113-
| NVIDIA | [Llama-3.3-70B-Instruct-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.3-70B-Instruct-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
114-
| NVIDIA | [Llama-3.1-8B-Instruct-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.1-8B-Instruct-NIM-microservice/version/3/registry/azureml-nvidia) | Chat completion |
115-
| NVIDIA | [Mistral-7B-Instruct-v0.3-NIM-microservice](https://ai.azure.com/explore/models/Mistral-7B-Instruct-v0.3-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
116-
| NVIDIA | [Mixtral-8x7B-Instruct-v0.1-NIM-microservice](https://ai.azure.com/explore/models/Mixtral-8x7B-Instruct-v0.1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
117-
| NVIDIA | [Llama-3.2-NV-embedqa-1b-v2-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.2-NV-embedqa-1b-v2-NIM-microservice/version/2/registry/azureml-nvidia) | Embeddings |
118-
| NVIDIA | [Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice/version/2/registry/azureml-nvidia) | Text classification |
119-
| NVIDIA | [Openfold2-NIM-microservice](https://ai.azure.com/explore/models/Openfold2-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
120-
| NVIDIA | [ProteinMPNN-NIM-microservice](https://ai.azure.com/explore/models/ProteinMPNN-NIM-microservice/version/2/registry/azureml-nvidia) | Protein Binder |
121-
| NVIDIA | [MSA-search-NIM-microservice](https://ai.azure.com/explore/models/MSA-search-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
122-
| NVIDIA | [Rfdiffusion-NIM-microservice](https://ai.azure.com/explore/models/Rfdiffusion-NIM-microservice/version/1/registry/azureml-nvidia) | Protein Binder |
115+
## Supported models
116+
117+
The following sections list the supported models for managed compute deployment with pay-as-you-go billing, grouped by collection.
118+
119+
### Paige AI
120+
121+
| Model | Task |
122+
|--|--|
123+
| [Virchow2G](https://ai.azure.com/explore/models/Virchow2G/version/1/registry/azureml-paige) | Image Feature Extraction |
124+
| [Virchow2G-Mini](https://ai.azure.com/explore/models/Virchow2G-Mini/version/1/registry/azureml-paige) | Image Feature Extraction |
125+
126+
### Cohere
127+
128+
| Model | Task |
129+
|--|--|
130+
| [Command A](https://ai.azure.com/explore/models/cohere-command-a/version/3/registry/azureml-cohere) | Chat completion |
131+
| [Embed v4](https://ai.azure.com/explore/models/embed-v-4-0/version/4/registry/azureml-cohere) | Embeddings |
132+
| [Rerank v3.5](https://ai.azure.com/explore/models/Cohere-rerank-v3.5/version/2/registry/azureml-cohere) | Text classification |
133+
134+
### NVIDIA
135+
136+
NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimized pretrained and customized AI models serving on NVIDIA GPUs. NVIDIA NIMs available on Azure AI Foundry model catalog can be deployed with a Standard subscription to the [NVIDIA NIM SaaS offer](https://aka.ms/nvidia-nims-plan) on Azure Marketplace.
137+
138+
Some special things to note about NIMs are:
139+
140+
- **NIMs include a 90-day trial**. The trial applies to all NIMs associated with a particular SaaS subscription, and starts from the time the SaaS subscription is created.
141+
142+
- **SaaS subscriptions scope to an Azure AI Foundry project**. Because multiple models are associated with a single Azure Marketplace offer, you only need to subscribe once to the NIM offer within a project, then you're able to deploy all the NIMs offered by NVIDIA in the AI Foundry model catalog. If you want to deploy NIMs in a different project with no existing SaaS subscription, you need to resubscribe to the offer.
143+
144+
145+
| Model | Task |
146+
|--|--|
147+
| [Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
148+
| [Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
149+
| [Deepseek-R1-Distill-Llama-8B-NIM-microservice](https://ai.azure.com/explore/models/Deepseek-R1-Distill-Llama-8B-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
150+
| [Llama-3.3-70B-Instruct-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.3-70B-Instruct-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
151+
| [Llama-3.1-8B-Instruct-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.1-8B-Instruct-NIM-microservice/version/3/registry/azureml-nvidia) | Chat completion |
152+
| [Mistral-7B-Instruct-v0.3-NIM-microservice](https://ai.azure.com/explore/models/Mistral-7B-Instruct-v0.3-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
153+
| [Mixtral-8x7B-Instruct-v0.1-NIM-microservice](https://ai.azure.com/explore/models/Mixtral-8x7B-Instruct-v0.1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
154+
| [Llama-3.2-NV-embedqa-1b-v2-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.2-NV-embedqa-1b-v2-NIM-microservice/version/2/registry/azureml-nvidia) | Embeddings |
155+
| [Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice/version/2/registry/azureml-nvidia) | Text classification |
156+
| [Openfold2-NIM-microservice](https://ai.azure.com/explore/models/Openfold2-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
157+
| [ProteinMPNN-NIM-microservice](https://ai.azure.com/explore/models/ProteinMPNN-NIM-microservice/version/2/registry/azureml-nvidia) | Protein Binder |
158+
| [MSA-search-NIM-microservice](https://ai.azure.com/explore/models/MSA-search-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
159+
| [Rfdiffusion-NIM-microservice](https://ai.azure.com/explore/models/Rfdiffusion-NIM-microservice/version/1/registry/azureml-nvidia) | Protein Binder |
160+
161+
#### Consume NVIDIA NIM deployments
162+
163+
After your deployment is successfully created, you can follow the steps in [Consume deployments](#consume-deployments) to consume it.
164+
165+
NVIDIA NIMs on Azure AI Foundry expose an OpenAI compatible API. See the [API reference](https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html#) to learn more about the payload supported. The `model` parameter for NIMs on Azure AI Foundry is set to a default value within the container and isn't required to be passed in to the request payload to your online endpoint. The **Consume** tab of the NIM deployment on Azure AI Foundry includes code samples for inference with the target URL of your deployment.
166+
167+
You can also consume NIM deployments using the [Azure AI Foundry Models SDK](/python/api/overview/azure/ai-inference-readme), with limitations that include:
168+
169+
- No support for [creating and authenticating clients using `load_client`](/python/api/overview/azure/ai-inference-readme#create-and-authenticate-clients-using-load_client).
170+
- You should call client method `get_model_info` to [retrieve model information](/python/api/overview/azure/ai-inference-readme#get-ai-model-information).
171+
172+
##### Develop and run agents with NIM endpoints
173+
174+
The following NVIDIA NIMs of **chat completions** task type in the model catalog can be used to [create and run agents using Agent Service](/python/api/overview/azure/ai-projects-readme#agents-preview) using various supported tools, with the following two extra requirements:
175+
176+
1. Create a _Serverless Connection_ to the project using the NIM endpoint and Key. The target URL for the NIM endpoint in the connection should be `https://<endpoint-name>.region.inference.ml.azure.com/v1/`.
177+
1. Set the _model parameter_ in the request body to be of the form, `https://<endpoint>.region.inference.ml.azure.com/v1/@<parameter value per table below>` while creating and running agents.
178+
179+
180+
| NVIDIA NIM | `model` parameter value |
181+
|----------------------------------------------------|----------------------------------|
182+
| Llama-3.3-70B-Instruct-NIM-microservice | meta/llama-3.3-70b-instruct |
183+
| Llama-3.1-8B-Instruct-NIM-microservice | meta/llama-3.1-8b-instruct |
184+
| Mistral-7B-Instruct-v0.3-NIM-microservice | mistralai/mistral-7b-instruct-v0.3 |
185+
186+
187+
#### Security scanning
188+
189+
NVIDIA ensures the security and reliability of NVIDIA NIM container images through best-in-class vulnerability scanning, rigorous patch management, and transparent processes. To learn more about security scanning, see the [security page](https://docs.nvidia.com/ai-enterprise/planning-resource/security-for-azure-ai-foundry/latest/introduction.html). Microsoft works with NVIDIA to get the latest patches of the NIMs to deliver secure, stable, and reliable production-grade software within Azure AI Foundry.
190+
191+
You can refer to the _last updated time_ for the NIM on the right pane of the model's overview page. You can redeploy to consume the latest version of NIM from NVIDIA on Azure AI Foundry.
123192

124193

125194

0 commit comments

Comments
 (0)