add sections from Nvidia article

msakande · msakande · commit 2bd6c254faf1 · 2025-06-24T12:25:55.000-05:00
diff --git a/articles/ai-foundry/how-to/deploy-models-managed-pay-go.md b/articles/ai-foundry/how-to/deploy-models-managed-pay-go.md
@@ -6,14 +6,14 @@ manager: scottpolly
 ms.service: azure-ai-foundry
 ms.custom:
 ms.topic: how-to
-ms.date: 06/23/2025
+ms.date: 06/24/2025
 ms.reviewer: tinaem
 reviewer: tinaem
 ms.author: mopeakande
 author: msakande
 ---
 
-# Deploy Azure AI Foundry Models with pay-as-you-go billing to managed compute
+# Deploy Azure AI Foundry Models to managed compute with pay-as-you-go billing 
 
 Azure AI Foundry Models include a comprehensive catalog of models organized into two categories—Models sold directly by Azure, and [Models from partners and community](../concepts/foundry-models-overview.md#models-from-partners-and-community). These models from partners and community, which are available for deployment on a managed compute, are either open or protected models. In this article, you learn how to use protected models from partners and community, offered via Azure Marketplace for deployment on managed compute with pay-as-you-go billing. 
 
@@ -54,7 +54,7 @@ Azure AI Foundry enables a seamless subscription and transaction experience for
 - Per-hour Azure Machine Learning compute billing for the virtual machines employed in the deployment.
 - Surcharge billing for the model as set by the model publisher on the Azure Marketplace offer. 
 
-Pay-as-you-go billing of Azure compute and model surcharge are pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.  
+Pay-as-you-go billing of Azure compute and model surcharge is pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.  
 
 A user's subscription to Azure Marketplace offers are scoped to the project resource within Azure AI Foundry. If a subscription to the Azure Marketplace offer for a particular model already exists within the project, the user is informed in the deployment wizard that the subscription already exists for the project. 
 
@@ -93,6 +93,15 @@ The consumption-based surcharge is accrued to the associated SaaS subscription a
 
 1. Select the checkbox to acknowledge that you understand and agree to the terms of use. Then, select **Deploy**. Azure AI Foundry creates the user's subscription to the marketplace offer and then creates the deployment of the model on a managed compute. It takes about 15-20 minutes for the deployment to complete.
 
+## Consume deployments
+
+After your deployment is successfully created, you can follow these steps to consume it:
+
+1. Select **Models + Endpoints** under _My assets_ in your Azure AI Foundry project.
+1. Select your deployment from the **Model deployments** tab.
+1. Navigate to the **Test** tab for sample inference to the endpoint.
+1. Return to the **Details** tab and select **Open in Playground** to go to the chat playground and modify parameters for the inference requests.   
+
 ## Network isolation of deployments
 
 Collections in the model catalog can be deployed within your isolated networks using workspace managed virtual network. For more information on how to configure your workspace managed networks, see [Configure a managed virtual network to allow internet outbound](../../machine-learning/how-to-managed-network.md#configure-a-managed-virtual-network-to-allow-internet-outbound).
@@ -105,24 +114,26 @@ An Azure AI Foundry project with ingress Public Network Access disabled can only
 
 The following sections list the supported models for managed compute deployment with pay-as-you-go billing, grouped by collection.
 
-#### Paige AI
+### Paige AI
 
 | Model | Task |
 |--|--|
 | [Virchow2G](https://ai.azure.com/explore/models/Virchow2G/version/1/registry/azureml-paige) | Image Feature Extraction |
 | [Virchow2G-Mini](https://ai.azure.com/explore/models/Virchow2G-Mini/version/1/registry/azureml-paige) | Image Feature Extraction |
 
-#### Cohere
+### Cohere
 
 | Model | Task |
 |--|--|
 | [Command A](https://ai.azure.com/explore/models/cohere-command-a/version/3/registry/azureml-cohere) | Chat completion |
 | [Embed v4](https://ai.azure.com/explore/models/embed-v-4-0/version/4/registry/azureml-cohere) | Embeddings |
 | [Rerank v3.5](https://ai.azure.com/explore/models/Cohere-rerank-v3.5/version/2/registry/azureml-cohere) | Text classification |
 
-#### NVIDIA
+### NVIDIA
 
-NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimized pretrained and customized AI models serving on NVIDIA GPUs. NVIDIA NIMs available on Azure AI Foundry model catalog can be deployed with a Standard subscription to the [NVIDIA NIM SaaS offer](https://aka.ms/nvidia-nims-plan) on Azure Marketplace. Some special things to note about NIMs are:
+NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimized pretrained and customized AI models serving on NVIDIA GPUs. NVIDIA NIMs available on Azure AI Foundry model catalog can be deployed with a Standard subscription to the [NVIDIA NIM SaaS offer](https://aka.ms/nvidia-nims-plan) on Azure Marketplace.
+
+Some special things to note about NIMs are:
 
 - **NIMs include a 90-day trial**. The trial applies to all NIMs associated with a particular SaaS subscription, and starts from the time the SaaS subscription is created.
 
@@ -145,6 +156,38 @@ NVIDIA inference microservices (NIM) are containers built by NVIDIA for optimize
 | [MSA-search-NIM-microservice](https://ai.azure.com/explore/models/MSA-search-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
 | [Rfdiffusion-NIM-microservice](https://ai.azure.com/explore/models/Rfdiffusion-NIM-microservice/version/1/registry/azureml-nvidia) | Protein Binder |
 
+### Consume NVIDIA NIM deployments
+
+After your deployment is successfully created, you can follow the steps in [Consume deployments](#consume-deployments) to consume it.
+
+NVIDIA NIMs on Azure AI Foundry expose an OpenAI compatible API. See the [API reference](https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html#) to learn more about the payload supported. The `model` parameter for NIMs on Azure AI Foundry is set to a default value within the container and isn't required to be passed in to the request payload to your online endpoint. The **Consume** tab of the NIM deployment on Azure AI Foundry includes code samples for inference with the target URL of your deployment. 
+
+You can also consume NIM deployments using the [Azure AI Foundry Models SDK](/python/api/overview/azure/ai-inference-readme), with limitations that include:
+
+- No support for [creating and authenticating clients using `load_client`](/python/api/overview/azure/ai-inference-readme#create-and-authenticate-clients-using-load_client).
+- You should call client method `get_model_info` to [retrieve model information](/python/api/overview/azure/ai-inference-readme#get-ai-model-information).
+
+### Develop and run agents with NIM endpoints
+
+The following NVIDIA NIMs of **chat completions** task type in the model catalog can be used to [create and run agents using Agent Service](/python/api/overview/azure/ai-projects-readme#agents-preview) using various supported tools, with the following two extra requirements: 
+
+1. Create a _Serverless Connection_ to the project using the NIM endpoint and Key. The target URL for the NIM endpoint in the connection should be `https://<endpoint-name>.region.inference.ml.azure.com/v1/`. 
+2. Set the _model parameter_ in the request body to be of the form, `https://<endpoint>.region.inference.ml.azure.com/v1/@<parameter value per table below>` while creating and running agents.
+
+
+| NVIDIA NIM                                         | `model` parameter value           |
+|----------------------------------------------------|----------------------------------|
+| Llama-3.3-70B-Instruct-NIM-microservice            | meta/llama-3.3-70b-instruct      |
+| Llama-3.1-8B-Instruct-NIM-microservice             | meta/llama-3.1-8b-instruct       |
+| Mistral-7B-Instruct-v0.3-NIM-microservice          | mistralai/mistral-7b-instruct-v0.3 |
+
+
+### Security scanning
+
+NVIDIA ensures the security and reliability of NVIDIA NIM container images through best-in-class vulnerability scanning, rigorous patch management, and transparent processes. To learn more about security scanning, see the [security page](https://docs.nvidia.com/ai-enterprise/planning-resource/security-for-azure-ai-foundry/latest/introduction.html). Microsoft works with NVIDIA to get the latest patches of the NIMs to deliver secure, stable, and reliable production-grade software within Azure AI Foundry.
+
+You can refer to the _last updated time_ for the NIM on the right pane of the model's overview page. You can redeploy to consume the latest version of NIM from NVIDIA on Azure AI Foundry. 
+
 
 
 ## Related content