Skip to content

Commit 2b639a4

Browse files
authored
Merge pull request #260783 from eric-urban/eur/llama
llama pay as you go and fine-tune
2 parents db5160c + 515f097 commit 2b639a4

27 files changed

+760
-90
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15397,6 +15397,11 @@
1539715397
"redirect_url": "/azure/event-grid/scripts/event-grid-cli-subscribe-custom-topic",
1539815398
"redirect_document_id": false
1539915399
},
15400+
{
15401+
"source_path_from_root": "/articles/ai-studio/how-to/deploy-models.md",
15402+
"redirect_URL": "/azure/ai-studio/concepts/deployments-overview",
15403+
"redirect_document_id": true
15404+
},
1540015405
{
1540115406
"source_path_from_root": "/articles/notebooks/use-machine-learning-services-jupyter-notebooks.md",
1540215407
"redirect_url": "/azure/machine-learning/samples-notebooks",

articles/ai-studio/concepts/deployments-overview.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,14 @@ ms.service: azure-ai-studio
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 11/15/2023
11+
ms.date: 12/7/2023
1212
ms.author: eur
1313
---
1414

1515
# Overview: Deploy models, flows, and web apps with Azure AI Studio
1616

17+
[!INCLUDE [Azure AI Studio preview](../includes/preview-ai-studio.md)]
18+
1719
Azure AI Studio supports deploying large language models (LLMs), flows, and web apps. Deploying an LLM or flow makes it available for use in a website, an application, or other production environments. This typically involves hosting the model on a server or in the cloud, and creating an API or other interface for users to interact with the model.
1820

1921
You often hear this interaction with a model referred to as "inferencing". Inferencing is the process of applying new input data to a model to generate outputs. Inferencing can be used in various applications. For example, a chat completion model can be used to autocomplete words or phrases that a person is typing in real-time. A chat model can be used to generate a response to "can you create an itinerary for a single day visit in Seattle?". The possibilities are endless.
@@ -25,61 +27,63 @@ First you might ask:
2527
- "How do I choose the right model?" Azure AI Studio provides a [model catalog](../how-to/model-catalog.md) that allows you to search and filter models based on your use case. You can also test a model on a sample playground before deploying it to your project.
2628
- "From where in Azure AI Studio can I deploy a model?" You can deploy a model from the model catalog or from your project's deployment page.
2729

28-
Azure AI Studio simplifies deployments. A simple select or a line of code deploys a model and generate an API endpoint for your applications to consume. For a how-to guide, see [Deploying models with Azure AI Studio](../how-to/deploy-models.md).
30+
Azure AI Studio simplifies deployments. A simple select or a line of code deploys a model and generate an API endpoint for your applications to consume.
2931

30-
## Deploying flows
32+
### Azure OpenAI models
3133

32-
What is a flow and why would you want to deploy it? A flow is a sequence of tools that can be used to build a generative AI application. Deploying a flow differs from deploying a model in that you can customize the flow with your own data and other components such as embeddings, vector DB lookup. and custom connections. For a how-to guide, see [Deploying flows with Azure AI Studio](../how-to/flow-deploy.md).
34+
Azure OpenAI allows you to get access to the latest OpenAI models with the enterprise features from Azure. Learn more about [how to deploy OpenAI models in AI studio](../how-to/deploy-models-openai.md).
3335

34-
For example, you can build a chatbot that uses your data to generate informed and grounded responses to user queries. When you add your data in the playground, a prompt flow is automatically generated for you. You can deploy the flow as-is or customize it further with your own data and other components. In Azure AI Studio, you can also create your own flow from scratch.
36+
### Open models
3537

36-
Whichever way you choose to create a flow in Azure AI Studio, you can deploy it quickly and generate an API endpoint for your applications to consume.
38+
The model catalog offers access to a large variety of models across different modalities. Certain models in the model catalog can be deployed as a service with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need.
3739

38-
## Deploying web apps
40+
#### Deploy models with model as a service
3941

40-
The model or flow that you deploy can be used in a web application hosted in Azure. Azure AI Studio provides a quick way to deploy a web app. For more information, see the [chat with your data tutorial](../tutorials/deploy-chat-web-app.md).
42+
This deployment option doesn't require quota from your subscription. You're billed per token in a pay-as-you-go fashion. Learn how to deploy and consume [Llama 2 model family](../how-to/deploy-models-llama.md) with model as a service.
4143

44+
#### Deploy models with hosted managed infrastructure
4245

43-
## Planning AI safety for a deployed model
46+
You can also host open models in your own subscription with managed infrastructure, virtual machines, and number of instances for capacity management. Currently offering a wide range of models from Azure AI, HuggingFace, and Nvidia. Learn more about [how to deploy open models to real-time endpoints](../how-to/deploy-models-open.md).
4447

45-
For Azure OpenAI models such as GPT-4, Azure AI Studio provides AI safety filter during the deployment to ensure responsible use of AI. AI content safety filter allows moderation of harmful and sensitive contents to promote the safety of AI-enhanced applications. In addition to AI safety filter, Azure AI Studio offers model monitoring for deployed models. Model monitoring for LLMs uses the latest GPT language models to monitor and alert when the outputs of the model perform poorly against the set thresholds of generation safety and quality. For example, you can configure a monitor to evaluate how well the model’s generated answers align with information from the input source ("groundedness") and closely match to a ground truth sentence or document ("similarity").
48+
### Billing for deploying and inferencing LLMs in Azure AI Studio
4649

47-
## Optimizing the performance of a deployed model
50+
The following table describes how you're billed for deploying and inferencing LLMs in Azure AI Studio. See [monitor costs for models offered throughout the Azure Marketplace](../how-to/costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace) to learn more about how to track costs.
4851

49-
Optimizing LLMs requires a careful consideration of several factors, including operational metrics (ex. latency), quality metrics (ex. accuracy), and cost. It's important to work with experienced data scientists and engineers to ensure your model is optimized for your specific use case.
52+
| Use case | Azure OpenAI models | Models deployed with pay-as-you-go | Models deployed to real-time endpoints |
53+
| --- | --- | --- | --- |
54+
| Deploying a model from the model catalog to your project | No, you aren't billed for deploying an Azure OpenAI model to your project. | Yes, you're billed per the infrastructure of the endpoint<sup>1</sup> | Yes, you're billed for the infrastructure hosting the model<sup>2</sup> |
55+
| Testing chat mode on Playground after deploying a model to your project | Yes, you're billed based on your token usage | Yes, you're billed based on your token usage | None. |
56+
| Testing a model on a sample playground on the model catalog (if applicable) | Not applicable | None. | None. |
57+
| Testing a model in playground under your project (if applicable) or in the test tab in the deployment details page under your project. | Yes, you're billed based on your token usage | Yes, you're billed based on your token usage | None. |
5058

59+
<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure hosting the model itself in pay-as-you-go. After the endpoint is deleted, no further charges are made.
5160

52-
## Regional availability and quota limits of a model
61+
<sup>2</sup> Billing is done in a minute-basis depending on the SKU and the number of instances used in the deployment since the moment of creation. After the endpoint is deleted, no further charges are made.
5362

54-
For Azure OpenAI models, the default quota for models varies by model and region. Certain models might only be available in some regions. For more information, see [Azure OpenAI Service quotas and limits](/azure/ai-services/openai/quotas-limits).
63+
## Deploying flows
5564

56-
## Quota for deploying and inferencing a model
65+
What is a flow and why would you want to deploy it? A flow is a sequence of tools that can be used to build a generative AI application. Deploying a flow differs from deploying a model in that you can customize the flow with your own data and other components such as embeddings, vector DB lookup. and custom connections. For a how-to guide, see [Deploying flows with Azure AI Studio](../how-to/flow-deploy.md).
5766

58-
For Azure OpenAI models, deploying and inferencing consumes quota that is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minutes (TPM). When you sign up for Azure AI Studio, you receive default quota for most available models. Then, you assign TPM to each deployment as it is created, and the available quota for that model will be reduced by that amount. You can continue to create deployments and assign them TPM until you reach your quota limit.
67+
For example, you can build a chatbot that uses your data to generate informed and grounded responses to user queries. When you add your data in the playground, a prompt flow is automatically generated for you. You can deploy the flow as-is or customize it further with your own data and other components. In Azure AI Studio, you can also create your own flow from scratch.
5968

60-
Once that happens, you can only create new deployments of that model by:
69+
Whichever way you choose to create a flow in Azure AI Studio, you can deploy it quickly and generate an API endpoint for your applications to consume.
6170

62-
- Request more quota by submitting a [quota increase form](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR4xPXO648sJKt4GoXAed-0pURVJWRU4yRTMxRkszU0NXRFFTTEhaT1g1NyQlQCN0PWcu).
63-
- Adjust the allocated quota on other model deployments to free up tokens for new deployments on [Azure OpenAI Portal](https://oai.azure.com/portal).
71+
## Deploying web apps
6472

65-
To learn more, see [Manage Azure OpenAI Service quota documentation](../../ai-services/openai/how-to/quota.md?tabs=rest).
73+
The model or flow that you deploy can be used in a web application hosted in Azure. Azure AI Studio provides a quick way to deploy a web app. For more information, see the [chat with your data tutorial](../tutorials/deploy-chat-web-app.md).
6674

67-
For other models such as Llama and Falcon models, deploying and inferencing can be done by consuming Virtual Machine (VM) core quota that is assigned to your subscription a per-region basis. When you sign up for Azure AI Studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request for quota increase.
6875

69-
## Billing for deploying and inferencing LLMs in Azure AI Studio
76+
## Planning AI safety for a deployed model
7077

71-
The following table describes how you're billed for deploying and inferencing LLMs in Azure AI Studio.
78+
For Azure OpenAI models such as GPT-4, Azure AI Studio provides AI safety filter during the deployment to ensure responsible use of AI. AI content safety filter allows moderation of harmful and sensitive contents to promote the safety of AI-enhanced applications. In addition to AI safety filter, Azure AI Studio offers model monitoring for deployed models. Model monitoring for LLMs uses the latest GPT language models to monitor and alert when the outputs of the model perform poorly against the set thresholds of generation safety and quality. For example, you can configure a monitor to evaluate how well the model’s generated answers align with information from the input source ("groundedness") and closely match to a ground truth sentence or document ("similarity").
7279

73-
| Use case | Azure OpenAI models | Open source and Meta models |
74-
| --- | --- | --- |
75-
| Deploying a model from the model catalog to your project | No, you aren't billed for deploying an Azure OpenAI model to your project. | Yes, you're billed for deploying (hosting) an open source or a Meta model |
76-
| Testing chat mode on Playground after deploying a model to your project | Yes, you're billed based on your token usage | Not applicable |
77-
| Consuming a deployed model inside your application | Yes, you're billed based on your token usage | Yes, you're billed for scoring your hosted open source or Meta model |
78-
| Testing a model on a sample playground on the model catalog (if applicable) | Not applicable | No, you aren't billed without deploying (hosting) an open source or a Meta model |
79-
| Testing a model in playground under your project (if applicable) or in the test tab in the deployment details page under your project. | Not applicable | Yes, you're billed for scoring your hosted open source or Meta model. |
80+
## Optimizing the performance of a deployed model
8081

82+
Optimizing LLMs requires a careful consideration of several factors, including operational metrics (ex. latency), quality metrics (ex. accuracy), and cost. It's important to work with experienced data scientists and engineers to ensure your model is optimized for your specific use case.
8183

8284
## Next steps
8385

84-
- Learn how you can build generative AI applications in the [Azure AI Studio](../what-is-ai-studio.md).
86+
- Learn [how to deploy OpenAI models with Azure AI Studio](../how-to/deploy-models-openai.md).
87+
- Learn [how to deploy Llama 2 family of large language models with Azure AI Studio](../how-to/deploy-models-llama.md).
88+
- Learn [how to deploy how to deploy large language models with Azure AI Studio](../how-to/deploy-models-open.md).
8589
- Get answers to frequently asked questions in the [Azure AI FAQ article](../faq.yml).

0 commit comments

Comments
 (0)