Skip to content

Commit db5cbff

Browse files
Merge pull request #5671 from MicrosoftDocs/main
Merged by Learn.Build PR Management system
2 parents 298d7c2 + dcc5c79 commit db5cbff

File tree

19 files changed

+2223
-1327
lines changed

19 files changed

+2223
-1327
lines changed

articles/ai-foundry/concepts/fine-tuning-overview.md

Lines changed: 57 additions & 127 deletions
Large diffs are not rendered by default.

articles/ai-foundry/concepts/models-featured.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ The following table lists the Cohere models that you can inference via the Found
7676
| [Cohere-command-r-08-2024](https://ai.azure.com/explore/models/Cohere-command-r-08-2024/version/1/registry/azureml-cohere) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (131,072 tokens) <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
7777
| [Cohere-command-r-plus](https://ai.azure.com/explore/models/Cohere-command-r-plus/version/1/registry/azureml-cohere) <br> (deprecated) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (131,072 tokens) <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
7878
| [Cohere-command-r](https://ai.azure.com/explore/models/Cohere-command-r/version/1/registry/azureml-cohere) <br> (deprecated)| [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (131,072 tokens) <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
79-
| [Cohere-embed-4](https://aka.ms/aistudio/landing/cohere-embed-4) | [embeddings](../model-inference/how-to/use-embeddings.md?context=/azure/ai-foundry/context/context) <br /> [image-embeddings](../model-inference/how-to/use-image-embeddings.md?context=/azure/ai-foundry/context/context) | - **Input:** image, text <br /> - **Output:** image, text (128,000 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** image, text |
79+
| [Cohere-embed-v-4](https://aka.ms/aistudio/landing/cohere-embed-4) | [embeddings](../model-inference/how-to/use-embeddings.md?context=/azure/ai-foundry/context/context) <br /> [image-embeddings](../model-inference/how-to/use-image-embeddings.md?context=/azure/ai-foundry/context/context) | - **Input:** image, text <br /> - **Output:** image, text (128,000 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** image, text |
8080
| [Cohere-embed-v3-english](https://ai.azure.com/explore/models/Cohere-embed-v3-english/version/1/registry/azureml-cohere) | [embeddings](../model-inference/how-to/use-embeddings.md?context=/azure/ai-foundry/context/context) <br /> [image-embeddings](../model-inference/how-to/use-image-embeddings.md?context=/azure/ai-foundry/context/context) | - **Input:** text (512 tokens) <br /> - **Output:** Vector (1,024 dim.) |
8181
| [Cohere-embed-v3-multilingual](https://ai.azure.com/explore/models/Cohere-embed-v3-multilingual/version/1/registry/azureml-cohere) | [embeddings](../model-inference/how-to/use-embeddings.md?context=/azure/ai-foundry/context/context) <br /> [image-embeddings](../model-inference/how-to/use-image-embeddings.md?context=/azure/ai-foundry/context/context) | - **Input:** text (512 tokens) <br /> - **Output:** Vector (1,024 dim.) |
8282

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
title: Deploy Azure AI Foundry Models to managed compute with pay-as-you-go billing
3+
titleSuffix: Azure AI Foundry
4+
description: Learn how to deploy protected models from partners and community on Azure AI Foundry managed compute and understand how pay-as-you-go surcharge billing works.
5+
manager: scottpolly
6+
ms.service: azure-ai-foundry
7+
ms.custom:
8+
ms.topic: how-to
9+
ms.date: 06/23/2025
10+
ms.reviewer: tinaem
11+
reviewer: tinaem
12+
ms.author: mopeakande
13+
author: msakande
14+
---
15+
16+
# Deploy Azure AI Foundry Models with pay-as-you-go billing to managed compute
17+
18+
Azure AI Foundry Models include a comprehensive catalog of models organized into two categories—Models sold directly by Azure, and [Models from partners and community](../concepts/foundry-models-overview.md#models-from-partners-and-community). These models from partners and community, which are available for deployment on a managed compute, are either open or protected models. In this article, you learn how to use protected models from partners and community, offered via Azure Marketplace for deployment on managed compute with pay-as-you-go billing.
19+
20+
21+
## Prerequisites
22+
23+
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
24+
25+
- If you don't have one, [create a [!INCLUDE [hub](../includes/hub-project-name.md)]](create-projects.md?pivots=hub-project).
26+
27+
- [Azure Marketplace purchases enabled](/azure/cost-management-billing/manage/enable-marketplace-purchases) for your Azure subscription.
28+
29+
- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned a *custom role* with the following permissions. User accounts assigned the *Owner* or *Contributor* role for the Azure subscription can also create deployments. For more information on permissions, see [Role-based access control in Azure AI Foundry portal](/azure/ai-foundry/concepts/rbac-azure-ai-foundry).
30+
31+
32+
- On the Azure subscription— **to subscribe the workspace/project to the Azure Marketplace offering**:
33+
34+
- Microsoft.MarketplaceOrdering/agreements/offers/plans/read
35+
- Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action
36+
- Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read
37+
- Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read
38+
- Microsoft.SaaS/register/action
39+
40+
- On the resource group— **to create and use the SaaS resource**:
41+
42+
- Microsoft.SaaS/resources/read
43+
- Microsoft.SaaS/resources/write
44+
45+
- On the workspace— **to deploy endpoints**:
46+
47+
- Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*
48+
- Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*
49+
50+
## Subscription scope and unit of measure for Azure Marketplace offer
51+
52+
Azure AI Foundry enables a seamless subscription and transaction experience for protected models as you create and consume your dedicated model deployments at scale. The deployment of protected models on managed compute involves pay-as-you-go billing for the customer in two dimensions:
53+
54+
- Per-hour Azure Machine Learning compute billing for the virtual machines employed in the deployment.
55+
- Surcharge billing for the model as set by the model publisher on the Azure Marketplace offer.
56+
57+
Pay-as-you-go billing of Azure compute and model surcharge are pro-rated per minute based on the uptime of the managed online deployments. The surcharge for a model is a per GPU-hour price, set by the partner (or model's publisher) on Azure Marketplace, for all the supported GPUs that can be used to deploy the model on Azure AI Foundry managed compute.
58+
59+
A user's subscription to Azure Marketplace offers are scoped to the project resource within Azure AI Foundry. If a subscription to the Azure Marketplace offer for a particular model already exists within the project, the user is informed in the deployment wizard that the subscription already exists for the project.
60+
61+
To find all the SaaS subscriptions that exist in an Azure subscription:
62+
63+
1. Sign in to the [Azure portal](https://portal.azure.com) and go to your Azure subscription.
64+
65+
1. Select **Subscriptions** and then select your Azure subscription to open its overview page.
66+
67+
1. Select **Settings** > **Resources** to see the list of resources.
68+
69+
1. Use the **Type** filter to select the SaaS resource type.
70+
71+
The consumption-based surcharge is accrued to the associated SaaS subscription and billed to a user via Azure Marketplace. You can view the invoice in the **Overview** tab of the respective SaaS subscription.
72+
73+
## Subscribe and deploy on managed compute
74+
75+
1. Sign in to [Azure AI Foundry](https://ai.azure.com/?cid=learnDocs).
76+
1. If you're not already in your project, select it.
77+
1. Select **Model catalog** from the left pane.
78+
1. Select the **Deployment options** filter in the model catalog and choose **Managed compute**.
79+
1. Filter the list further by selecting the **Collection** and model of your choice. In this article, we use **Cohere Command A** from the [list of supported models](#supported-models-for-managed-compute-deployment-with-pay-as-you-go-billing) for illustration.
80+
1. From the model's page, select **Use this model** to open the deployment wizard.
81+
1. Choose from one of the supported VM SKUs for the model. You need to have Azure Machine Learning Compute quota for that SKU in your Azure subscription.
82+
1. Select **Customize** to specify your deployment configuration for parameters such as the instance count. You can also select an existing endpoint for the deployment or create a new one. For this example, we specify an instance count of **1** and create a new endpoint for the deployment.
83+
84+
:::image type="content" source="../media/deploy-models-managed-pay-go/deployment-configuration.png" alt-text="Screenshot of the deployment configuration screen for a protected model in Azure AI Foundry." lightbox="../media/deploy-models-managed-pay-go/deployment-configuration.png":::
85+
86+
1. Select **Next** to proceed to the *pricing breakdown* page.
87+
1. Review the pricing breakdown for the deployment, terms of use, and license agreement associated with the model's offer on Azure Marketplace. The pricing breakdown tells you what the aggregated pricing for the deployed model would be, where the surcharge for the model is a function of the number of GPUs in the VM instance that is selected in the previous steps. In addition to the applicable surcharge for the model, Azure compute charges also apply, based on your deployment configuration. If you have existing reservations or Azure savings plan, the invoice for the compute charges honors and reflects the discounted VM pricing.
88+
89+
:::image type="content" source="../media/deploy-models-managed-pay-go/pricing-breakdown.png" alt-text="Screenshot of the pricing breakdown page for a protected model deployment in Azure AI Foundry." lightbox="../media/deploy-models-managed-pay-go/pricing-breakdown.png":::
90+
91+
1. Select the checkbox to acknowledge that you understand and agree to the terms of use. Then, select **Deploy**. Azure AI Foundry creates the user's subscription to the marketplace offer and then creates the deployment of the model on a managed compute. It takes about 15-20 minutes for the deployment to complete.
92+
93+
## Network isolation of deployments
94+
95+
Collections in the model catalog can be deployed within your isolated networks using workspace managed virtual network. For more information on how to configure your workspace managed networks, see [Configure a managed virtual network to allow internet outbound](../../machine-learning/how-to-managed-network.md#configure-a-managed-virtual-network-to-allow-internet-outbound).
96+
97+
#### Limitation
98+
99+
An Azure AI Foundry project with ingress Public Network Access disabled can only support a single active deployment of one of the protected models from the catalog. Attempts to create more active deployments result in deployment creation failures.
100+
101+
## Supported models for managed compute deployment with pay-as-you-go billing
102+
103+
| Collection | Model | Task |
104+
|--|--|--|
105+
| Paige AI | [Virchow2G](https://ai.azure.com/explore/models/Virchow2G/version/1/registry/azureml-paige) | Image Feature Extraction |
106+
| Paige AI | [Virchow2G-Mini](https://ai.azure.com/explore/models/Virchow2G-Mini/version/1/registry/azureml-paige) | Image Feature Extraction |
107+
| Cohere | [Command A](https://ai.azure.com/explore/models/cohere-command-a/version/3/registry/azureml-cohere) | Chat completion |
108+
| Cohere | [Embed v4](https://ai.azure.com/explore/models/embed-v-4-0/version/4/registry/azureml-cohere) | Embeddings |
109+
| Cohere | [Rerank v3.5](https://ai.azure.com/explore/models/Cohere-rerank-v3.5/version/2/registry/azureml-cohere) | Text classification |
110+
| NVIDIA | [Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
111+
| NVIDIA | [Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
112+
| NVIDIA | [Deepseek-R1-Distill-Llama-8B-NIM-microservice](https://ai.azure.com/explore/models/Deepseek-R1-Distill-Llama-8B-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
113+
| NVIDIA | [Llama-3.3-70B-Instruct-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.3-70B-Instruct-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
114+
| NVIDIA | [Llama-3.1-8B-Instruct-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.1-8B-Instruct-NIM-microservice/version/3/registry/azureml-nvidia) | Chat completion |
115+
| NVIDIA | [Mistral-7B-Instruct-v0.3-NIM-microservice](https://ai.azure.com/explore/models/Mistral-7B-Instruct-v0.3-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
116+
| NVIDIA | [Mixtral-8x7B-Instruct-v0.1-NIM-microservice](https://ai.azure.com/explore/models/Mixtral-8x7B-Instruct-v0.1-NIM-microservice/version/2/registry/azureml-nvidia) | Chat completion |
117+
| NVIDIA | [Llama-3.2-NV-embedqa-1b-v2-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.2-NV-embedqa-1b-v2-NIM-microservice/version/2/registry/azureml-nvidia) | Embeddings |
118+
| NVIDIA | [Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice](https://ai.azure.com/explore/models/Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice/version/2/registry/azureml-nvidia) | Text classification |
119+
| NVIDIA | [Openfold2-NIM-microservice](https://ai.azure.com/explore/models/Openfold2-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
120+
| NVIDIA | [ProteinMPNN-NIM-microservice](https://ai.azure.com/explore/models/ProteinMPNN-NIM-microservice/version/2/registry/azureml-nvidia) | Protein Binder |
121+
| NVIDIA | [MSA-search-NIM-microservice](https://ai.azure.com/explore/models/MSA-search-NIM-microservice/version/3/registry/azureml-nvidia) | Protein Binder |
122+
| NVIDIA | [Rfdiffusion-NIM-microservice](https://ai.azure.com/explore/models/Rfdiffusion-NIM-microservice/version/1/registry/azureml-nvidia) | Protein Binder |
123+
124+
125+
126+
## Related content
127+
128+
* [How to deploy and inference a managed compute deployment](deploy-models-managed.md)
129+
* [Explore Azure AI Foundry Models](../concepts/foundry-models-overview.md)
130+
288 KB
Loading
184 KB
Loading

articles/ai-foundry/model-inference/concepts/models.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,14 +131,15 @@ The Cohere family of models includes various models optimized for different use
131131

132132
| Model | Type | Tier | Capabilities |
133133
| ------ | ---- | --- | ------------ |
134+
| [Cohere-command-A](https://aka.ms/aistudio/landing/cohere-command-a) | chat-completion | Global Standard | - **Input:** text (256,000 tokens)<br>- **Output:** text (8,000 tokens)<br>- **Languages:** `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar`<br>- **Tool calling:** Yes<br>- **Response formats:** Text, JSON |
134135
| [Cohere-command-r-plus-08-2024](https://ai.azure.com/explore/models/Cohere-command-r-plus-08-2024/version/1/registry/azureml-cohere) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (4,096 tokens) <br /> - **Languages:** `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
135136
| [Cohere-command-r-08-2024](https://ai.azure.com/explore/models/Cohere-command-r-08-2024/version/1/registry/azureml-cohere) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (4,096 tokens) <br /> - **Languages:** `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
136137
| [Cohere-command-r-plus](https://ai.azure.com/explore/models/Cohere-command-r-plus/version/1/registry/azureml-cohere) <br> (deprecated) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (4,096 tokens) <br /> - **Languages:** `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
137138
| [Cohere-command-r](https://ai.azure.com/explore/models/Cohere-command-r/version/1/registry/azureml-cohere) <br> (deprecated) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (4,096 tokens) <br /> - **Languages:** `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
139+
| [Cohere-embed-v-4](https://aka.ms/aistudio/landing/cohere-embed-4) | embeddings | Global Standard | - **Input:** image, text<br>- **Output:** image, text (128,000 tokens)<br>- **Languages:** English, German, Spanish, French, Italian, Japanese, Korean, Arabic, Chinese, Hindi (total 100+ languages)<br>- **Tool calling:** Yes<br>- **Response formats:** image, text |
138140
| [Cohere-embed-v3-english](https://ai.azure.com/explore/models/Cohere-embed-v3-english/version/1/registry/azureml-cohere) | embeddings <br /> image-embeddings | Global standard | - **Input:** text (512 tokens) <br /> - **Output:** Vector (1,024 dim.) <br /> - **Languages:** en |
139141
| [Cohere-embed-v3-multilingual](https://ai.azure.com/explore/models/Cohere-embed-v3-multilingual/version/1/registry/azureml-cohere) | embeddings <br /> image-embeddings | Global standard | - **Input:** text (512 tokens) <br /> - **Output:** Vector (1,024 dim.) <br /> - **Languages:** `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` |
140142

141-
142143
See [this model collection in Azure AI Foundry portal](https://ai.azure.com/explore/models?&selectedCollection=cohere).
143144

144145
### Core42

articles/ai-foundry/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,8 @@ items:
167167
items:
168168
- name: Deploy models via managed compute
169169
href: how-to/deploy-models-managed.md
170+
- name: Deploy models to managed compute pay-as-you-go
171+
href: how-to/deploy-models-managed-pay-go.md
170172
- name: Healthcare AI models
171173
items:
172174
- name: Foundational AI models for healthcare

0 commit comments

Comments
 (0)