You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/concepts/deployments-overview.md
+35-31Lines changed: 35 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,14 @@ ms.service: azure-ai-studio
8
8
ms.custom:
9
9
- ignite-2023
10
10
ms.topic: conceptual
11
-
ms.date: 11/15/2023
11
+
ms.date: 12/7/2023
12
12
ms.author: eur
13
13
---
14
14
15
15
# Overview: Deploy models, flows, and web apps with Azure AI Studio
16
16
17
+
[!INCLUDE [Azure AI Studio preview](../includes/preview-ai-studio.md)]
18
+
17
19
Azure AI Studio supports deploying large language models (LLMs), flows, and web apps. Deploying an LLM or flow makes it available for use in a website, an application, or other production environments. This typically involves hosting the model on a server or in the cloud, and creating an API or other interface for users to interact with the model.
18
20
19
21
You often hear this interaction with a model referred to as "inferencing". Inferencing is the process of applying new input data to a model to generate outputs. Inferencing can be used in various applications. For example, a chat completion model can be used to autocomplete words or phrases that a person is typing in real-time. A chat model can be used to generate a response to "can you create an itinerary for a single day visit in Seattle?". The possibilities are endless.
@@ -25,61 +27,63 @@ First you might ask:
25
27
- "How do I choose the right model?" Azure AI Studio provides a [model catalog](../how-to/model-catalog.md) that allows you to search and filter models based on your use case. You can also test a model on a sample playground before deploying it to your project.
26
28
- "From where in Azure AI Studio can I deploy a model?" You can deploy a model from the model catalog or from your project's deployment page.
27
29
28
-
Azure AI Studio simplifies deployments. A simple select or a line of code deploys a model and generate an API endpoint for your applications to consume. For a how-to guide, see [Deploying models with Azure AI Studio](../how-to/deploy-models.md).
30
+
Azure AI Studio simplifies deployments. A simple select or a line of code deploys a model and generate an API endpoint for your applications to consume.
29
31
30
-
##Deploying flows
32
+
### Azure OpenAI models
31
33
32
-
What is a flow and why would you want to deploy it? A flow is a sequence of tools that can be used to build a generative AI application. Deploying a flow differs from deploying a model in that you can customize the flow with your own data and other components such as embeddings, vector DB lookup. and custom connections. For a how-to guide, see [Deploying flows with Azure AI Studio](../how-to/flow-deploy.md).
34
+
Azure OpenAI allows you to get access to the latest OpenAI models with the enterprise features from Azure. Learn more about [howto deploy OpenAI models in AI studio](../how-to/deploy-models-openai.md).
33
35
34
-
For example, you can build a chatbot that uses your data to generate informed and grounded responses to user queries. When you add your data in the playground, a prompt flow is automatically generated for you. You can deploy the flow as-is or customize it further with your own data and other components. In Azure AI Studio, you can also create your own flow from scratch.
36
+
### Open models
35
37
36
-
Whichever way you choose to create a flow in Azure AI Studio, you can deploy it quickly and generate an API endpoint for your applications to consume.
38
+
The model catalog offers access to a large variety of models across different modalities. Certain models in the model catalog can be deployed as a service with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need.
37
39
38
-
##Deploying web apps
40
+
#### Deploy models with model as a service
39
41
40
-
The model or flow that you deploy can be used in a web application hosted in Azure. Azure AI Studio provides a quick way to deploy a web app. For more information, see the [chat with your data tutorial](../tutorials/deploy-chat-web-app.md).
42
+
This deployment option doesn't require quota from your subscription. You're billed per token in a pay-as-you-go fashion. Learn how to deploy and consume [Llama 2 model family](../how-to/deploy-models-llama.md) with model as a service.
41
43
44
+
#### Deploy models with hosted managed infrastructure
42
45
43
-
## Planning AI safety for a deployed model
46
+
You can also host open models in your own subscription with managed infrastructure, virtual machines, and number of instances for capacity management. Currently offering a wide range of models from Azure AI, HuggingFace, and Nvidia. Learn more about [how to deploy open models to real-time endpoints](../how-to/deploy-models-open.md).
44
47
45
-
For Azure OpenAI models such as GPT-4, Azure AI Studio provides AI safety filter during the deployment to ensure responsible use of AI. AI content safety filter allows moderation of harmful and sensitive contents to promote the safety of AI-enhanced applications. In addition to AI safety filter, Azure AI Studio offers model monitoring for deployed models. Model monitoring for LLMs uses the latest GPT language models to monitor and alert when the outputs of the model perform poorly against the set thresholds of generation safety and quality. For example, you can configure a monitor to evaluate how well the model’s generated answers align with information from the input source ("groundedness") and closely match to a ground truth sentence or document ("similarity").
48
+
### Billing for deploying and inferencing LLMs in Azure AI Studio
46
49
47
-
## Optimizing the performance of a deployed model
50
+
The following table describes how you're billed for deploying and inferencing LLMs in Azure AI Studio. See [monitor costs for models offered throughout the Azure Marketplace](../how-to/costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace) to learn more about how to track costs.
48
51
49
-
Optimizing LLMs requires a careful consideration of several factors, including operational metrics (ex. latency), quality metrics (ex. accuracy), and cost. It's important to work with experienced data scientists and engineers to ensure your model is optimized for your specific use case.
52
+
| Use case | Azure OpenAI models | Models deployed with pay-as-you-go | Models deployed to real-time endpoints |
53
+
| --- | --- | --- | --- |
54
+
| Deploying a model from the model catalog to your project | No, you aren't billed for deploying an Azure OpenAI model to your project. | Yes, you're billed per the infrastructure of the endpoint<sup>1</sup> | Yes, you're billed for the infrastructure hosting the model<sup>2</sup> |
55
+
| Testing chat mode on Playground after deploying a model to your project | Yes, you're billed based on your token usage | Yes, you're billed based on your token usage | None. |
56
+
| Testing a model on a sample playground on the model catalog (if applicable) | Not applicable | None. | None. |
57
+
| Testing a model in playground under your project (if applicable) or in the test tab in the deployment details page under your project. | Yes, you're billed based on your token usage | Yes, you're billed based on your token usage | None. |
50
58
59
+
<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure hosting the model itself in pay-as-you-go. After the endpoint is deleted, no further charges are made.
51
60
52
-
## Regional availability and quota limits of a model
61
+
<sup>2</sup> Billing is done in a minute-basis depending on the SKU and the number of instances used in the deployment since the moment of creation. After the endpoint is deleted, no further charges are made.
53
62
54
-
For Azure OpenAI models, the default quota for models varies by model and region. Certain models might only be available in some regions. For more information, see [Azure OpenAI Service quotas and limits](/azure/ai-services/openai/quotas-limits).
63
+
## Deploying flows
55
64
56
-
## Quota for deploying and inferencing a model
65
+
What is a flow and why would you want to deploy it? A flow is a sequence of tools that can be used to build a generative AI application. Deploying a flow differs from deploying a model in that you can customize the flow with your own data and other components such as embeddings, vector DB lookup. and custom connections. For a how-to guide, see [Deploying flows with Azure AI Studio](../how-to/flow-deploy.md).
57
66
58
-
For Azure OpenAI models, deploying and inferencing consumes quota that is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minutes (TPM). When you sign up for Azure AI Studio, you receive default quota for most available models. Then, you assign TPM to each deployment as it is created, and the available quota for that model will be reduced by that amount. You can continue to create deployments and assign them TPM until you reach your quota limit.
67
+
For example, you can build a chatbot that uses your data to generate informed and grounded responses to user queries. When you add your data in the playground, a prompt flow is automatically generated for you. You can deploy the flow as-is or customize it further with your own data and other components. In Azure AI Studio, you can also create your own flow from scratch.
59
68
60
-
Once that happens, you can only create new deployments of that model by:
69
+
Whichever way you choose to create a flow in Azure AI Studio, you can deploy it quickly and generate an API endpoint for your applications to consume.
61
70
62
-
- Request more quota by submitting a [quota increase form](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR4xPXO648sJKt4GoXAed-0pURVJWRU4yRTMxRkszU0NXRFFTTEhaT1g1NyQlQCN0PWcu).
63
-
- Adjust the allocated quota on other model deployments to free up tokens for new deployments on [Azure OpenAI Portal](https://oai.azure.com/portal).
71
+
## Deploying web apps
64
72
65
-
To learn more, see [Manage Azure OpenAI Service quota documentation](../../ai-services/openai/how-to/quota.md?tabs=rest).
73
+
The model or flow that you deploy can be used in a web application hosted in Azure. Azure AI Studio provides a quick way to deploy a web app. For more information, see the [chat with your data tutorial](../tutorials/deploy-chat-web-app.md).
66
74
67
-
For other models such as Llama and Falcon models, deploying and inferencing can be done by consuming Virtual Machine (VM) core quota that is assigned to your subscription a per-region basis. When you sign up for Azure AI Studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request for quota increase.
68
75
69
-
## Billing for deploying and inferencing LLMs in Azure AI Studio
76
+
## Planning AI safety for a deployed model
70
77
71
-
The following table describes how you're billed for deploying and inferencing LLMs in Azure AI Studio.
78
+
For Azure OpenAI models such as GPT-4, Azure AI Studio provides AI safety filter during the deployment to ensure responsible use of AI. AI content safety filter allows moderation of harmful and sensitive contents to promote the safety of AI-enhanced applications. In addition to AI safety filter, Azure AI Studio offers model monitoring for deployed models. Model monitoring for LLMs uses the latest GPT language models to monitor and alert when the outputs of the model perform poorly against the set thresholds of generation safety and quality. For example, you can configure a monitor to evaluate how well the model’s generated answers align with information from the input source ("groundedness") and closely match to a ground truth sentence or document ("similarity").
72
79
73
-
| Use case | Azure OpenAI models | Open source and Meta models |
74
-
| --- | --- | --- |
75
-
| Deploying a model from the model catalog to your project | No, you aren't billed for deploying an Azure OpenAI model to your project. | Yes, you're billed for deploying (hosting) an open source or a Meta model |
76
-
| Testing chat mode on Playground after deploying a model to your project | Yes, you're billed based on your token usage | Not applicable |
77
-
| Consuming a deployed model inside your application | Yes, you're billed based on your token usage | Yes, you're billed for scoring your hosted open source or Meta model |
78
-
| Testing a model on a sample playground on the model catalog (if applicable) | Not applicable | No, you aren't billed without deploying (hosting) an open source or a Meta model |
79
-
| Testing a model in playground under your project (if applicable) or in the test tab in the deployment details page under your project. | Not applicable | Yes, you're billed for scoring your hosted open source or Meta model. |
80
+
## Optimizing the performance of a deployed model
80
81
82
+
Optimizing LLMs requires a careful consideration of several factors, including operational metrics (ex. latency), quality metrics (ex. accuracy), and cost. It's important to work with experienced data scientists and engineers to ensure your model is optimized for your specific use case.
81
83
82
84
## Next steps
83
85
84
-
- Learn how you can build generative AI applications in the [Azure AI Studio](../what-is-ai-studio.md).
86
+
- Learn [how to deploy OpenAI models with Azure AI Studio](../how-to/deploy-models-openai.md).
87
+
- Learn [how to deploy Llama 2 family of large language models with Azure AI Studio](../how-to/deploy-models-llama.md).
88
+
- Learn [how to deploy how to deploy large language models with Azure AI Studio](../how-to/deploy-models-open.md).
85
89
- Get answers to frequently asked questions in the [Azure AI FAQ article](../faq.yml).
0 commit comments