MicrosoftDocs
diff --git a/‎articles/ai-foundry/.openpublishing.redirection.ai-studio.json‎
Lines changed: 6 additions & 1 deletion b/‎articles/ai-foundry/.openpublishing.redirection.ai-studio.json‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎articles/ai-foundry/concepts/deployments-overview.md‎
Lines changed: 17 additions & 17 deletions b/‎articles/ai-foundry/concepts/deployments-overview.md‎
Lines changed: 17 additions & 17 deletions
diff --git a/‎articles/ai-foundry/how-to/prompt-flow.md‎ renamed to ‎articles/ai-foundry/concepts/prompt-flow.md‎
Lines changed: 2 additions & 2 deletions b/‎articles/ai-foundry/how-to/prompt-flow.md‎ renamed to ‎articles/ai-foundry/concepts/prompt-flow.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai-foundry/how-to/deploy-nvidia-inference-microservice.md‎
Lines changed: 100 additions & 0 deletions b/‎articles/ai-foundry/how-to/deploy-nvidia-inference-microservice.md‎
Lines changed: 100 additions & 0 deletions
@@ -1097,6 +1097,11 @@
             "source_path_from_root": "/articles/ai-foundry/model-inference/reference/reference-model-inference-images-embeddings.md",
             "redirect_url": "/rest/api/aifoundry/model-inference/get-image-embeddings/get-image-embeddings",
             "redirect_document_id": false
-          }
+          },
+          {
+            "source_path_from_root": "/articles/ai-foundry/how-to/prompt-flow.md",
+            "redirect_url": "/azure/ai-foundry/concepts/prompt-flow",
+            "redirect_document_id": true
+        }
     ]
 }
@@ -4,10 +4,6 @@ titleSuffix: Azure AI Foundry
 description: Learn about deploying models in Azure AI Foundry portal.
 manager: scottpolly
 ms.service: azure-ai-foundry
-ms.custom:
-  - ignite-2023
-  - build-2024
-  - ignite-2024
 ms.topic: concept-article
 ms.date: 10/21/2024
 ms.reviewer: fasantia
@@ -17,22 +13,28 @@ author: msakande
 
 # Overview: Deploy AI models in Azure AI Foundry portal
 
-The model catalog in Azure AI Foundry portal is the hub to discover and use a wide range of models for building generative AI applications. Models need to be deployed to make them available for receiving inference requests. The process of interacting with a deployed model is called *inferencing*. Azure AI Foundry offer a comprehensive suite of deployment options for those models depending on your needs and model requirements.
+The model catalog in Azure AI Foundry portal is the hub to discover and use a wide range of models for building generative AI applications. Models need to be deployed to make them available for receiving inference requests. Azure AI Foundry offers a comprehensive suite of deployment options for those models depending on your needs and model requirements.
 
 ## Deploying models
 
-Deployment options vary depending on the model type:
+Deployment options vary depending on the model offering:
 
-* **Azure OpenAI models:** The latest OpenAI models that have enterprise features from Azure.
-* **Models as a Service models:** These models don't require compute quota from your subscription. This option allows you to deploy your Model as a Service (MaaS). You use a serverless API deployment and are billed per token in a pay-as-you-go fashion.
-* **Open and custom models:** The model catalog offers access to a large variety of models across modalities that are of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management. There's a wide range of models from Azure OpenAI, Hugging Face, and NVIDIA.
+* **Azure OpenAI models:** The latest OpenAI models that have enterprise features from Azure with flexible billing options.
+* **Models-as-a-Service models:** These models don't require compute quota from your subscription and are billed per token in a pay-as-you-go fashion. 
+* **Open and custom models:** The model catalog offers access to a large variety of models across modalities, including models of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management.
 
 Azure AI Foundry offers four different deployment options:
 
 |Name                           | Azure OpenAI service | Azure AI model inference | Serverless API | Managed compute |
 |-------------------------------|----------------------|-------------------|----------------|-----------------|
-| Which models can be deployed? | [Azure OpenAI models](../../ai-services/openai/concepts/models.md)        | [Azure OpenAI models and Models as a Service](../../ai-foundry/model-inference/concepts/models.md) | [Models as a Service](../how-to/model-catalog-overview.md#content-safety-for-models-deployed-via-serverless-apis) | [Open and custom models](../how-to/model-catalog-overview.md#availability-of-models-for-deployment-as-managed-compute) |
+| Which models can be deployed? | [Azure OpenAI models](../../ai-services/openai/concepts/models.md)        | [Azure OpenAI models and Models-as-a-Service](../../ai-foundry/model-inference/concepts/models.md) | [Models-as-a-Service](../how-to/model-catalog-overview.md#content-safety-for-models-deployed-via-serverless-apis) | [Open and custom models](../how-to/model-catalog-overview.md#availability-of-models-for-deployment-as-managed-compute) |
 | Deployment resource           | Azure OpenAI resource | Azure AI services resource | AI project resource | AI project resource |
+| Requires Hubs/Projects        | No | No | Yes | Yes |
+| Data processing options       | Regional <br /> Data-zone  <br /> Global | Global | Regional | Regional |
+| Private networking            | Yes | Yes | Yes | Yes |
+| Content filtering             | Yes | Yes | Yes | No  |
+| Custom content filtering      | Yes | Yes | No  | No  |
+| Key-less authentication       | Yes | Yes | No  | No  |
 | Best suited when              | You are planning to use only OpenAI models | You are planning to take advantage of the flagship models in Azure AI catalog, including OpenAI. | You are planning to use a single model from a specific provider (excluding OpenAI). | If you plan to use open models and you have enough compute quota available in your subscription. |
 | Billing bases                 | Token usage & PTU         | Token usage       | Token usage<sup>1</sup>      | Compute core hours<sup>2</sup> |
 | Deployment instructions       | [Deploy to Azure OpenAI Service](../how-to/deploy-models-openai.md) | [Deploy to Azure AI model inference](../model-inference/how-to/create-model-deployments.md) | [Deploy to Serverless API](../how-to/deploy-models-serverless.md) | [Deploy to Managed compute](../how-to/deploy-models-managed.md) |
@@ -48,18 +50,16 @@ Azure AI Foundry offers four different deployment options:
 
 Azure AI Foundry encourages customers to explore the deployment options and pick the one that best suites their business and technical needs. In general you can use the following thinking process:
 
-1. Start with the deployment options that have the bigger scopes. This allows you to iterate and prototype faster in your application without having to rebuild your architecture each time you decide to change something. [Azure AI model inference](../../ai-foundry/model-inference/overview.md) is a deployment target that supports all the flagship models in the Azure AI catalog, including latest innovation from Azure OpenAI. To get started, follow [Configure your AI project to use Azure AI model inference](../../ai-foundry/model-inference/how-to/quickstart-ai-project.md).
+* Start with [Azure AI model inference](../../ai-foundry/model-inference/overview.md) which is the option with the bigger scope. This allows you to iterate and prototype faster in your application without having to rebuild your architecture each time you decide to change something. If you are using Azure AI Foundry Hubs or Projects, enable it by [turning on Azure AI model inference](../../ai-foundry/model-inference/how-to/quickstart-ai-project.md).
 
-2. When you are looking to use a specific model:
+* When you are looking to use a specific model:
 
-   1. When you are interested in Azure OpenAI models, use the Azure OpenAI Service which offers a wide range of capabilities for them and it's designed for them.
+   * When you are interested in Azure OpenAI models, use the Azure OpenAI Service which offers a wide range of capabilities for them and it's designed for them.
 
-   2. When you are interested in a particular model from Models as a Service, and you don't expect to use any other type of model, use [Serverless API endpoints](../how-to/deploy-models-serverless.md). They allow deployment of a single model under a unique set of endpoint URL and keys.
+   * When you are interested in a particular model from Models-as-a-Service, and you don't expect to use any other type of model, use [Serverless API endpoints](../how-to/deploy-models-serverless.md). They allow deployment of a single model under a unique set of endpoint URL and keys.
 
-3. When your model is not available in Models as a Service and you have compute quota available in your subscription, use [Managed Compute](../how-to/deploy-models-managed.md) which support deployment of open and custom models. It also allows high level of customization of the deployment inference server, protocols, and detailed configuration.
+* When your model is not available in Models-as-a-Service and you have compute quota available in your subscription, use [Managed Compute](../how-to/deploy-models-managed.md) which support deployment of open and custom models. It also allows high level of customization of the deployment inference server, protocols, and detailed configuration.
 
-> [!TIP]
-> Each deployment option may offer different capabilities in terms of networking, security, and additional features like content safety. Review the documentation for each of them to understand their limitations.
 
 ## Related content
 
 
@@ -9,7 +9,7 @@ ms.custom:
   - build-2024
   - ignite-2024
 ms.topic: conceptual
-ms.date: 11/19/2024
+ms.date: 03/18/2025
 ms.reviewer: none
 ms.author: lagayhar
 author: lgayhardt
@@ -108,5 +108,5 @@ If the prompt flow tools in Azure AI Foundry portal don't meet your requirements
 
 ## Next steps
 
-- [Build with prompt flow in Azure AI Foundry portal](flow-develop.md)
+- [Build with prompt flow in Azure AI Foundry portal](../how-to/flow-develop.md)
 - [Get started with prompt flow in VS Code](https://microsoft.github.io/promptflow/how-to-guides/quick-start.html)
@@ -0,0 +1,100 @@
+---
+title: How to deploy NVIDIA Inference Microservices
+titleSuffix: Azure AI Foundry
+description: Learn to deploy NVIDIA Inference Microservices, using Azure AI Foundry.
+manager: scottpolly
+ms.service: azure-ai-foundry
+ms.topic: how-to
+ms.date: 03/14/2024
+ms.author: ssalgado
+author: ssalgadodev
+ms.reviewer: tinaem
+reviewer: tinaem
+ms.custom:  devx-track-azurecli
+---
+
+# How to deploy NVIDIA Inference Microservices
+
+In this article, you learn how to deploy NVIDIA Inference Microservices (NIMs) on Managed Compute in the model catalog on Foundry. NVIDIA inference microservices are containers built by NVIDIA for optimized pre-trained and customized AI models serving on NVIDIA GPUs. 
+Get improved TCO (total cost of ownership) and performance with NVIDIA NIMs offered for one-click deployment on Foundry, with enterprise production-grade software under NVIDIA AI Enterprise license. 
+
+[!INCLUDE [models-preview](../includes/models-preview.md)]
+
+## Prerequisites
+
+- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
+
+- An [Azure AI Foundry hub](create-azure-ai-resource.md).
+
+- An [Azure AI Foundry project](create-projects.md).
+
+- Ensure Marketplace purchases are enabled for your Azure subscription. Learn more about it [here](/azure/cost-management-billing/manage/enable-marketplace-purchases).
+
+- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned a _custom role_ with the following permissions. User accounts assigned the _Owner_ or _Contributor_ role for the Azure subscription can also create NIM deployments. For more information on permissions, see [Role-based access control in Azure AI Foundry portal](../concepts/rbac-ai-foundry.md).
+
+    -	On the Azure subscription—**to subscribe the workspace to the Azure Marketplace offering**, once for each workspace/project:
+        -	Microsoft.MarketplaceOrdering/agreements/offers/plans/read
+        -	Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action
+        -	Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read
+        -	Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read
+        -	Microsoft.SaaS/register/action
+
+    -	On the resource group—**to create and use the SaaS resource**:
+        -   Microsoft.SaaS/resources/read
+        -	Microsoft.SaaS/resources/write
+
+    -	On the workspace—**to deploy endpoints**:
+        -	Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*
+        -	Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* 
+
+
+## NVIDIA NIM PayGo offer on Azure Marketplace by NVIDIA
+
+ NVIDIA NIMs available on Azure AI Foundry model catalog can be deployed with a subscription to the [NVIDIA NIM SaaS offer](https://aka.ms/nvidia-nims-plan) on Azure Marketplace. This offer includes a 90-day trial that applies to all NIMs associated with a particular SaaS subscription scoped to an Azure AI Foundry project, and has a PayGo price of $1 per GPU hour post the trial period. 
+
+ Azure AI Foundry enables a seamless purchase flow of the NVIDIA NIM offering on Marketplace from NVIDIA collection in the model catalog, and further deployment on managed compute.
+
+## Deploy NVIDIA Inference Microservices on Managed Compute
+
+1. Sign in to [Azure AI Foundry](https://ai.azure.com) and go to the **Home** page.
+2. Select **Model catalog** from the left sidebar.
+3. In the filters section, select **Collections** and select **NVIDIA**.
+
+:::image type="content" source="../media/how-to/deploy-nvidia-inference-microservice/nvidia-collections.png" alt-text="A screenshot showing the Nvidia inference microservices available in the model catalog." lightbox="../media/how-to/deploy-nvidia-inference-microservice/nvidia-collections.png":::  
+
+4. Select the NVIDIA NIM of your choice. In this article, we are using **Llama-3.3-70B-Instruct-NIM-microservice** as an example.
+5. Select **Deploy**.
+6. Select one of the NVIDIA GPU based VM SKUs supported for the NIM, based on your intended workload. You need to have quota in your Azure subscription.
+7. You can then customize your deployment configuration for the instance count, select an existing endpoint or create a new one, etc. For the example in this article, we consider an instance count of **2** and create a new endpoint. 
+
+:::image type="content" source="../media/how-to/deploy-nvidia-inference-microservice/project-customization.png" alt-text="A screenshot showing project customization options in the deployment wizard." lightbox="../media/how-to/deploy-nvidia-inference-microservice/project-customization.png"::: 
+
+8. Select **Next**
+9. Then, review the pricing breakdown for the NIM deployment, terms of use and license agreement associated with the NIM offer. The pricing breakdown helps to inform what the aggregated pricing for the NIM software deployed would be, which is a function of the number of NVIDIA GPUs in the VM instance that was selected in the previous steps. In addition to the applicable NIM software price, Azure Compute charges also applies based on your deployment configuration.
+
+:::image type="content" source="../media/how-to/deploy-nvidia-inference-microservice/payment-description.png" alt-text="A screenshot showing the necessary user payment agreement detailing how the user is charged for deploying the models." lightbox="../media/how-to/deploy-nvidia-inference-microservice/payment-description.png":::  
+
+10. Select the checkbox to acknowledge understanding of pricing and terms of use, and then, select **Deploy**.
+
+## Consume NVIDIA NIM deployments
+
+After your deployment is successfully created, you can go to **Models + Endpoints** under My assets in your Azure AI Foundry project, select your deployment under "Model deployments" and navigate to the Test tab for sample inference to the endpoint. You can also go to the Chat Playground by selecting **Open in Playground** in Deployment Details tab, to be able to modify parameters for the inference requests.   
+
+NVIDIA NIMs on Foundry expose an OpenAI compatible API, learn more about the payload supported [here](https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html#). The 'model' parameter for NIMs on Foundry is set to a default value within the container, and is not required to pass through in the payload to your online endpoint. The **Consume** tab of the NIM deployment on Foundry includes code samples for inference with the target URL of your deployment. You can also consume NIM deployments using the Azure AI Model Inference SDK. 
+
+## Security scanning for NIMs by NVIDIA
+
+
+Redeploy to get the latest version of NIM from NVIDIA on Foundry. 
+
+## Network Isolation support for NIMs
+
+NVIDIA ensures the security and reliability of NVIDIA NIM container images through best-in-class vulnerability scanning, rigorous patch management, and transparent processes. Learn the details [here](https://docs.nvidia.com/ai-enterprise/planning-resource/security-for-azure-ai-foundry/latest/introduction.html). Microsoft works with NVIDIA to get the latest patches of the NIMs to deliver secure, stable, and reliable production-grade software within AI Foundry.
+Users can refer to the last updated time for the NIM in the model overview page, and you can redeploy to get the latest version of NIM from NVIDIA on Foundry.
+
+While NIMs are in preview on Foundry, workspaces with Public Network Access disabled will have a limitation of being able to create only one successful deployment in the private workspace or project. Note, there can only be a single active deployment in a private workspace, attempts to create more active deployments will end in failure.
+
+## Related content
+
+* Learn more about the [Model Catalog](./model-catalog-overview.md)
+* Learn more about [built-in policies for deployment](./built-in-policy-model-deployment.md)
Original file line number	Diff line number	Diff line change
`@@ -1097,6 +1097,11 @@`
`1097`	`1097`	`"source_path_from_root": "/articles/ai-foundry/model-inference/reference/reference-model-inference-images-embeddings.md",`
`1098`	`1098`	`"redirect_url": "/rest/api/aifoundry/model-inference/get-image-embeddings/get-image-embeddings",`
`1099`	`1099`	`"redirect_document_id": false`
`1100`		`- }`
	`1100`	`+ },`
	`1101`	`+ {`
	`1102`	`+ "source_path_from_root": "/articles/ai-foundry/how-to/prompt-flow.md",`
	`1103`	`+ "redirect_url": "/azure/ai-foundry/concepts/prompt-flow",`
	`1104`	`+ "redirect_document_id": true`
	`1105`	`+ }`
`1101`	`1106`	`]`
`1102`	`1107`	`}`