Skip to content

Commit 2bd6c25

Browse files
committed
add sections from Nvidia article
1 parent 1dba8ab commit 2bd6c25

File tree

1 file changed

+50
-7
lines changed

1 file changed

+50
-7
lines changed

articles/ai-foundry/how-to/deploy-models-managed-pay-go.md

Lines changed: 50 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom:
88
ms.topic: how-to
9-
ms.date: 06/23/2025
9+
ms.date: 06/24/2025
1010
ms.reviewer: tinaem
1111
reviewer: tinaem
1212
ms.author: mopeakande
1313
author: msakande
1414
---
1515

16-
# Deploy Azure AI Foundry Models with pay-as-you-go billing to managed compute
16+
# Deploy Azure AI Foundry Models to managed compute with pay-as-you-go billing
1717

1818
Azure AI Foundry Models include a comprehensive catalog of models organized into two categories—Models sold directly by Azure, and [Models from partners and community](../concepts/foundry-models-overview.md#models-from-partners-and-community). These models from partners and community, which are available for deployment on a managed compute, are either open or protected models. In this article, you learn how to use protected models from partners and community, offered via Azure Marketplace for deployment on managed compute with pay-as-you-go billing.
1919

@@ -54,7 +54,7 @@ Azure AI Foundry enables a seamless subscription and transaction experience for
5454
- Per-hour Azure Machine Learning compute billing for the virtual machines employed in the deployment.
5555
- Surcharge billing for the model as set by the model publisher on the Azure Marketplace offer.
5656

57-
Pay-as-you-go billing of Azure compute and model surcharge are pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.
57+
Pay-as-you-go billing of Azure compute and model surcharge is pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.
5858

5959
A user's subscription to Azure Marketplace offers are scoped to the project resource within Azure AI Foundry. If a subscription to the Azure Marketplace offer for a particular model already exists within the project, the user is informed in the deployment wizard that the subscription already exists for the project.
6060

@@ -93,6 +93,15 @@ The consumption-based surcharge is accrued to the associated SaaS subscription a
9393

9494
1. Select the checkbox to acknowledge that you understand and agree to the terms of use. Then, select **Deploy**. Azure AI Foundry creates the user's subscription to the marketplace offer and then creates the deployment of the model on a managed compute. It takes about 15-20 minutes for the deployment to complete.
9595

96+
## Consume deployments
97+
98+
After your deployment is successfully created, you can follow these steps to consume it:
99+
100+
1. Select **Models + Endpoints** under _My assets_ in your Azure AI Foundry project.
101+
1. Select your deployment from the **Model deployments** tab.
102+
1. Navigate to the **Test** tab for sample inference to the endpoint.
103+
1. Return to the **Details** tab and select **Open in Playground** to go to the chat playground and modify parameters for the inference requests.
104+
96105
## Network isolation of deployments
97106

98107
Collections in the model catalog can be deployed within your isolated networks using workspace managed virtual network. For more information on how to configure your workspace managed networks, see [Configure a managed virtual network to allow internet outbound](../../machine-learning/how-to-managed-network.md#configure-a-managed-virtual-network-to-allow-internet-outbound).
@@ -105,24 +114,26 @@ An Azure AI Foundry project with ingress Public Network Access disabled can only
105114

106115
The following sections list the supported models for managed compute deployment with pay-as-you-go billing, grouped by collection.
107116

108-
#### Paige AI
117+
### Paige AI
109118

110119
| Model | Task |
111120
|--|--|
112121
| [Virchow2G](https://ai.azure.com/explore/models/Virchow2G/version/1/registry/azureml-paige) | Image Feature Extraction |
113122
| [Virchow2G-Mini](https://ai.azure.com/explore/models/Virchow2G-Mini/version/1/registry/azureml-paige) | Image Feature Extraction |
114123

115-
#### Cohere
124+
### Cohere
116125

117126
| Model | Task |
118127
|--|--|
119128
| [Command A](https://ai.azure.com/explore/models/cohere-command-a/version/3/registry/azureml-cohere) | Chat completion |
120129
| [Embed v4](https://ai.azure.com/explore/models/embed-v-4-0/version/4/registry/azureml-cohere) | Embeddings |
121130
| [Rerank v3.5](https://ai.azure.com/explore/models/Cohere-rerank-v3.5/version/2/registry/azureml-cohere) | Text classification |
122131

123-
#### NVIDIA
132+
### NVIDIA
124133

125-
NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimized pretrained and customized AI models serving on NVIDIA GPUs. NVIDIA NIMs available on Azure AI Foundry model catalog can be deployed with a Standard subscription to the [NVIDIA NIM SaaS offer](https://aka.ms/nvidia-nims-plan) on Azure Marketplace. Some special things to note about NIMs are:
134+
NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimized pretrained and customized AI models serving on NVIDIA GPUs. NVIDIA NIMs available on Azure AI Foundry model catalog can be deployed with a Standard subscription to the [NVIDIA NIM SaaS offer](https://aka.ms/nvidia-nims-plan) on Azure Marketplace.
135+
136+
Some special things to note about NIMs are:
126137

127138
- **NIMs include a 90-day trial**. The trial applies to all NIMs associated with a particular SaaS subscription, and starts from the time the SaaS subscription is created.
128139

@@ -145,6 +156,38 @@ NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimize
145156
| [MSA-search-NIM-microservice](https://ai.azure.com/explore/models/MSA-search-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
146157
| [Rfdiffusion-NIM-microservice](https://ai.azure.com/explore/models/Rfdiffusion-NIM-microservice/version/1/registry/azureml-nvidia) | Protein Binder |
147158

159+
### Consume NVIDIA NIM deployments
160+
161+
After your deployment is successfully created, you can follow the steps in [Consume deployments](#consume-deployments) to consume it.
162+
163+
NVIDIA NIMs on Azure AI Foundry expose an OpenAI compatible API. See the [API reference](https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html#) to learn more about the payload supported. The `model` parameter for NIMs on Azure AI Foundry is set to a default value within the container and isn't required to be passed in to the request payload to your online endpoint. The **Consume** tab of the NIM deployment on Azure AI Foundry includes code samples for inference with the target URL of your deployment.
164+
165+
You can also consume NIM deployments using the [Azure AI Foundry Models SDK](/python/api/overview/azure/ai-inference-readme), with limitations that include:
166+
167+
- No support for [creating and authenticating clients using `load_client`](/python/api/overview/azure/ai-inference-readme#create-and-authenticate-clients-using-load_client).
168+
- You should call client method `get_model_info` to [retrieve model information](/python/api/overview/azure/ai-inference-readme#get-ai-model-information).
169+
170+
### Develop and run agents with NIM endpoints
171+
172+
The following NVIDIA NIMs of **chat completions** task type in the model catalog can be used to [create and run agents using Agent Service](/python/api/overview/azure/ai-projects-readme#agents-preview) using various supported tools, with the following two extra requirements:
173+
174+
1. Create a _Serverless Connection_ to the project using the NIM endpoint and Key. The target URL for the NIM endpoint in the connection should be `https://<endpoint-name>.region.inference.ml.azure.com/v1/`.
175+
2. Set the _model parameter_ in the request body to be of the form, `https://<endpoint>.region.inference.ml.azure.com/v1/@<parameter value per table below>` while creating and running agents.
176+
177+
178+
| NVIDIA NIM | `model` parameter value |
179+
|----------------------------------------------------|----------------------------------|
180+
| Llama-3.3-70B-Instruct-NIM-microservice | meta/llama-3.3-70b-instruct |
181+
| Llama-3.1-8B-Instruct-NIM-microservice | meta/llama-3.1-8b-instruct |
182+
| Mistral-7B-Instruct-v0.3-NIM-microservice | mistralai/mistral-7b-instruct-v0.3 |
183+
184+
185+
### Security scanning
186+
187+
NVIDIA ensures the security and reliability of NVIDIA NIM container images through best-in-class vulnerability scanning, rigorous patch management, and transparent processes. To learn more about security scanning, see the [security page](https://docs.nvidia.com/ai-enterprise/planning-resource/security-for-azure-ai-foundry/latest/introduction.html). Microsoft works with NVIDIA to get the latest patches of the NIMs to deliver secure, stable, and reliable production-grade software within Azure AI Foundry.
188+
189+
You can refer to the _last updated time_ for the NIM on the right pane of the model's overview page. You can redeploy to consume the latest version of NIM from NVIDIA on Azure AI Foundry.
190+
148191

149192

150193
## Related content

0 commit comments

Comments
 (0)