Skip to content

Commit f248dd7

Browse files
authored
Merge pull request #275734 from s-polly/stp-ai-maas-terminology
Stp ai maas terminology
2 parents 9086c39 + b038901 commit f248dd7

12 files changed

+31
-31
lines changed

articles/ai-studio/concepts/deployments-overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ Azure OpenAI allows you to get access to the latest OpenAI models with the enter
3838

3939
The model catalog offers access to a large variety of models across different modalities. Certain models in the model catalog can be deployed as a service with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need.
4040

41-
#### Deploy models with model as a service
41+
#### Deploy models with Model as a Service (Maas)
4242

43-
This deployment option doesn't require quota from your subscription. You're billed per token in a pay-as-you-go fashion. Learn how to deploy and consume [Llama 2 model family](../how-to/deploy-models-llama.md) with model as a service.
43+
This deployment option doesn't require quota from your subscription. You deploy as a Serverless API deployment and are billed per token in a pay-as-you-go fashion. Learn how to deploy and consume [Llama 2 model family](../how-to/deploy-models-llama.md) with model as a service.
4444

4545
#### Deploy models with hosted managed infrastructure
4646

@@ -50,7 +50,7 @@ You can also host open models in your own subscription with managed infrastructu
5050

5151
The following table describes how you're billed for deploying and inferencing LLMs in Azure AI Studio. See [monitor costs for models offered throughout the Azure Marketplace](../how-to/costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace) to learn more about how to track costs.
5252

53-
| Use case | Azure OpenAI models | Models deployed with pay-as-you-go | Models deployed to real-time endpoints |
53+
| Use case | Azure OpenAI models | Models deployed as Serverless APIs (pay-as-you-go) | Models deployed with managed compute |
5454
| --- | --- | --- | --- |
5555
| Deploying a model from the model catalog to your project | No, you aren't billed for deploying an Azure OpenAI model to your project. | Yes, you're billed per the infrastructure of the endpoint<sup>1</sup> | Yes, you're billed for the infrastructure hosting the model<sup>2</sup> |
5656
| Testing chat mode on Playground after deploying a model to your project | Yes, you're billed based on your token usage | Yes, you're billed based on your token usage | None. |

articles/ai-studio/how-to/deploy-models-cohere-command.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ To create a deployment:
102102
1. Return to the Deployments page, select the deployment, and note the endpoint's **Target** URL and the Secret **Key**. For more information on using the APIs, see the [reference](#reference-for-cohere-models-deployed-as-a-service) section.
103103
1. You can always find the endpoint's details, URL, and access keys by navigating to your **Project overview** page. Then, from the left sidebar of your project, select **Components** > **Deployments**.
104104

105-
To learn about billing for the Cohere models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Cohere models deployed as a service](#cost-and-quota-considerations-for-models-deployed-as-a-service).
105+
To learn about billing for the Cohere models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for models deployed as a serverless API](#cost-and-quota-considerations-for-models-deployed-as-a-serverless-api).
106106

107107
### Consume the Cohere models as a service
108108

@@ -666,9 +666,9 @@ Command R+ tool/function calling, using LangChain|`cohere`, `langchain`, `langch
666666

667667
## Cost and quotas
668668

669-
### Cost and quota considerations for models deployed as a service
669+
### Cost and quota considerations for models deployed as a serverless API
670670

671-
Cohere models deployed as a service are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model.
671+
Cohere models deployed as a serverless API with pay-as-you-go billing are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model.
672672

673673
Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently.
674674

@@ -678,7 +678,7 @@ Quota is managed per deployment. Each deployment has a rate limit of 200,000 tok
678678

679679
## Content filtering
680680

681-
Models deployed as a service with pay-as-you-go billing are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md).
681+
Models deployed as a serverless API with pay-as-you-go billing are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md).
682682

683683
## Related content
684684

articles/ai-studio/how-to/deploy-models-cohere-embed.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,7 @@ Command R+ tool/function calling, using LangChain|`cohere`, `langchain`, `langch
282282

283283
### Cost and quota considerations for models deployed as a service
284284

285-
Cohere models deployed as a service are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model.
285+
Cohere models deployed as a serverless API with pay-as-you-go billing are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model.
286286

287287
Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently.
288288

@@ -292,7 +292,7 @@ Quota is managed per deployment. Each deployment has a rate limit of 200,000 tok
292292

293293
## Content filtering
294294

295-
Models deployed as a service with pay-as-you-go billing are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md).
295+
Models deployed as a serverless API are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md).
296296

297297
## Related content
298298

articles/ai-studio/how-to/deploy-models-llama.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,18 @@ ms.custom: [references_regions]
1717

1818
[!INCLUDE [Feature preview](../includes/feature-preview.md)]
1919

20-
In this article, you learn about the Meta Llama models. You also learn how to use Azure AI Studio to deploy models from this set either as a service with pay-as you go billing or with hosted infrastructure in real-time endpoints.
20+
In this article, you learn about the Meta Llama models. You also learn how to use Azure AI Studio to deploy models from this set either to serverless APIs with pay-as you go billing or to managed compute.
2121

2222
> [!IMPORTANT]
2323
> Read more about the announcement of Meta Llama 3 models available now on Azure AI Model Catalog: [Microsoft Tech Community Blog](https://aka.ms/Llama3Announcement) and from [Meta Announcement Blog](https://aka.ms/meta-llama3-announcement-blog).
2424
2525
Meta Llama 3 models and tools are a collection of pretrained and fine-tuned generative text models ranging in scale from 8 billion to 70 billion parameters. The model family also includes fine-tuned versions optimized for dialogue use cases with reinforcement learning from human feedback (RLHF), called Meta-Llama-3-8B-Instruct and Meta-Llama-3-70B-Instruct. See the following GitHub samples to explore integrations with [LangChain](https://aka.ms/meta-llama3-langchain-sample), [LiteLLM](https://aka.ms/meta-llama3-litellm-sample), [OpenAI](https://aka.ms/meta-llama3-openai-sample) and the [Azure API](https://aka.ms/meta-llama3-azure-api-sample).
2626

27-
## Deploy Meta Llama models with pay-as-you-go
27+
## Deploy Meta Llama models as a serverless API
2828

29-
Certain models in the model catalog can be deployed as a service with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.
29+
Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.
3030

31-
Meta Llama 3 models are deployed as a service with pay-as-you-go through Microsoft Azure Marketplace, and they might add more terms of use and pricing.
31+
Meta Llama 3 models are deployed as a serverless API with pay-as-you-go billing through Microsoft Azure Marketplace, and they might add more terms of use and pricing.
3232

3333
### Azure Marketplace model offerings
3434

@@ -41,7 +41,7 @@ The following models are available in Azure Marketplace for Llama 3 when deploye
4141

4242
# [Meta Llama 2](#tab/llama-two)
4343

44-
The following models are available in Azure Marketplace for Llama 3 when deployed as a service with pay-as-you-go:
44+
The following models are available in Azure Marketplace for Llama 3 when deployed as a serverless API:
4545

4646
* Meta Llama-2-7B (preview)
4747
* Meta Llama 2 7B-Chat (preview)
@@ -52,7 +52,7 @@ The following models are available in Azure Marketplace for Llama 3 when deploye
5252

5353
---
5454

55-
If you need to deploy a different model, [deploy it to real-time endpoints](#deploy-meta-llama-models-to-real-time-endpoints) instead.
55+
If you need to deploy a different model, [deploy it to managed compute](#deploy-meta-llama-models-to-managed-compute) instead.
5656

5757
### Prerequisites
5858

@@ -125,7 +125,7 @@ To create a deployment:
125125

126126
Alternatively, you can initiate deployment by starting from your project in AI Studio. From the **Build** tab of your project, select **Deployments** > **+ Create**.
127127

128-
1. On the model's **Details** page, select **Deploy** and then select **Pay-as-you-go**.
128+
1. On the model's **Details** page, select **Deploy** and then select **Serverless API with Azure AI Content Safety**.
129129

130130
1. Select the project in which you want to deploy your models. To use the pay-as-you-go model deployment offering, your workspace must belong to the **East US 2** or **Sweden Central** region.
131131
1. On the deployment wizard, select the link to **Azure Marketplace Terms** to learn more about the terms of use. You can also select the **Marketplace offer details** tab to learn about pricing for the selected model.
@@ -157,7 +157,7 @@ To create a deployment:
157157

158158
Alternatively, you can initiate deployment by starting from your project in AI Studio. From the **Build** tab of your project, select **Deployments** > **+ Create**.
159159

160-
1. On the model's **Details** page, select **Deploy** and then select **Pay-as-you-go**.
160+
1. On the model's **Details** page, select **Deploy** and then select **Serverless API with Azure AI Content Safety**.
161161

162162
:::image type="content" source="../media/deploy-monitor/llama/deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the pay-as-you-go option." lightbox="../media/deploy-monitor/llama/deploy-pay-as-you-go.png":::
163163

@@ -193,7 +193,7 @@ To learn about billing for Llama models deployed with pay-as-you-go, see [Cost a
193193

194194
Models deployed as a service can be consumed using either the chat or the completions API, depending on the type of model you deployed.
195195

196-
1. On the **Build** page, select **Deployments**.
196+
1. From your **Project overview** page, go to the left sidebar and select **Components** > **Deployments**.
197197

198198
1. Find and select the deployment you created.
199199

@@ -213,7 +213,7 @@ Models deployed as a service can be consumed using either the chat or the comple
213213

214214
Models deployed as a service can be consumed using either the chat or the completions API, depending on the type of model you deployed.
215215

216-
1. On the **Build** page, select **Deployments**.
216+
1. From your **Project overview** page, go to the left sidebar and select **Components** > **Deployments**.
217217

218218
1. Find and select the deployment you created.
219219

@@ -483,9 +483,9 @@ The following is an example response:
483483
}
484484
```
485485

486-
## Deploy Meta Llama models to real-time endpoints
486+
## Deploy Meta Llama models to managed compute
487487

488-
Apart from deploying with the pay-as-you-go managed service, you can also deploy Meta Llama models to real-time endpoints in AI Studio. When deployed to real-time endpoints, you can select all the details about the infrastructure running the model, including the virtual machines to use and the number of instances to handle the load you're expecting. Models deployed to real-time endpoints consume quota from your subscription. All the models in the Llama family can be deployed to real-time endpoints.
488+
Apart from deploying with the pay-as-you-go managed service, you can also deploy Meta Llama models to managed compute in AI Studio. When deployed to managed compute, you can select all the details about the infrastructure running the model, including the virtual machines to use and the number of instances to handle the load you're expecting. Models deployed to managed compute consume quota from your subscription. All the models in the Llama family can be deployed to managed compute.
489489

490490
Follow these steps to deploy a model such as `Llama-2-7b-chat` to a real-time endpoint in [Azure AI Studio](https://ai.azure.com).
491491

@@ -520,9 +520,9 @@ Follow these steps to deploy a model such as `Llama-2-7b-chat` to a real-time en
520520

521521
1. Select the **Consume** tab of the deployment to obtain code samples that can be used to consume the deployed model in your application.
522522

523-
### Consume Llama 2 models deployed to real-time endpoints
523+
### Consume Llama 2 models deployed to managed compute
524524

525-
For reference about how to invoke Llama models deployed to real-time endpoints, see the model's card in the Azure AI Studio [model catalog](../how-to/model-catalog-overview.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
525+
For reference about how to invoke Llama models deployed to managed compute, see the model's card in the Azure AI Studio [model catalog](../how-to/model-catalog-overview.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
526526

527527
## Cost and quotas
528528

@@ -538,13 +538,13 @@ For more information on how to track costs, see [monitor costs for models offere
538538

539539
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
540540

541-
### Cost and quota considerations for Llama models deployed as real-time endpoints
541+
### Cost and quota considerations for Llama models deployed as managed compute
542542

543-
For deployment and inferencing of Llama models with real-time endpoints, you consume virtual machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once you reach this limit, you can request a quota increase.
543+
For deployment and inferencing of Llama models with managed compute, you consume virtual machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once you reach this limit, you can request a quota increase.
544544

545545
## Content filtering
546546

547-
Models deployed as a service with pay-as-you-go are protected by Azure AI Content Safety. When deployed to real-time endpoints, you can opt out of this capability. With Azure AI content safety enabled, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [Azure AI Content Safety](../concepts/content-filtering.md).
547+
Models deployed as a serverless API with pay-as-you-go are protected by Azure AI Content Safety. When deployed to managed compute, you can opt out of this capability. With Azure AI content safety enabled, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [Azure AI Content Safety](../concepts/content-filtering.md).
548548

549549
## Next steps
550550

0 commit comments

Comments
 (0)