|
1 | 1 | When you develop a generative AI app, you need to integrate language models into your application. To be able to use a language model, you need to deploy the model. Let's explore how to deploy language models in the Azure AI Foundry, after first understanding why to deploy a model.
|
2 | 2 |
|
3 |
| -## Understand why to deploy a model |
| 3 | +## Why to deploy a model |
4 | 4 |
|
5 |
| -Language models, like traditional machine learning models, are designed to generate output based on some input. To benefit from a model, you want a solution that can send input to a model, which the model processes, and then visualize the output somewhere. |
| 5 | +You train a model to generate output based on some input. To get value out of your model, you need a solution that allows you to send input to the model, which the model processes, after which the output is visualized for you. |
6 | 6 |
|
7 |
| -With generative AI apps, you have a chat application that expects input from a user, often in the form of a question. You want the model to process that input, and to generate a response that you can send back, through the chat application, to your user. To integrate a language model that can process input data and generate output data, you need the model to be deployed to an **endpoint**. |
| 7 | +With generative AI apps, the most common type of solution is a chat application that expects a user question, which the model processes, to generate an adequate response. The response is then visualized to the user as a response to their question. |
8 | 8 |
|
9 |
| -An endpoint is a specific URL where a deployed model or service can be accessed. It acts as a gateway for users to send their requests to the model and receive the results. Each model deployment typically has its own unique endpoint, which allows different applications to communicate with the model through an **API** (**Application Programming Interface**). |
| 9 | +:::image type="content" source="../media/request-endpoint.png" alt-text="Diagram of user question being processed by model deployed to endpoint."::: |
| 10 | + |
| 11 | +You can integrate a language model with a chat application by deploying the model to an **endpoint**. An endpoint is a specific URL where a deployed model or service can be accessed. Each model deployment typically has its own unique endpoint, which allows different applications to communicate with the model through an **API** (**Application Programming Interface**). |
| 12 | + |
| 13 | +When a user asks a question: |
| 14 | + |
| 15 | +1. An API request is sent to the endpoint. |
| 16 | +1. The endpoint specifies the model that processes the request. |
| 17 | +1. The result is sent back to the app through an API response. |
10 | 18 |
|
11 | 19 | When you deploy a language model from the model catalog with the Azure AI Foundry, you get an endpoint, which consists of a **target URI** (Uniform Resource Identifier) and a unique **key**. For example, a target URI for a deployed GPT-3.5 model can be:
|
12 | 20 |
|
13 |
| -`https://ai-aihubdevdemo.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-03-15-preview` |
| 21 | +``` |
| 22 | +https://ai-aihubdevdemo.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-03-15-preview |
| 23 | +``` |
14 | 24 |
|
15 |
| -The URI includes your AI hub name, your deployed model name, and it specifies what you want the model to do. In the example, the GPT-3.5 model is used for chat completion. |
| 25 | +The URI includes: |
16 | 26 |
|
17 |
| -To protect your deployed models, each deployment comes with a key. You're only authorized to send and receive requests to and from the target URI, if you also provide the key to authenticate. |
| 27 | +- Your **AI hub name**, for example `ai-aihubdevdemo`. |
| 28 | +- Your deployed **model name**, for example `gpt-35-turbo`. |
| 29 | +- The **task** for the model, for example `chat/completion`. |
18 | 30 |
|
19 |
| -To use a deployed model, you typically make an API call. You can make an API call using code like Python or C#, or a tool like Azure AI Foundry or [Postman](https://www.postman.com/?azure-portal=true). An API call involves sending a request to the model's endpoint using the API. The request usually includes the input data that you want the model to process. The model then processes the data and sends back a response with the results. This way, you can interact with the deployed model and utilize its capabilities in your applications. |
| 31 | +To protect your deployed models, each deployment comes with a key. You're only authorized to send and receive requests to and from the target URI, if you also provide the key to authenticate. |
20 | 32 |
|
21 | 33 | Now that you understand why you want to deploy a model, let's explore the deployment options with Azure AI Foundry.
|
22 | 34 |
|
23 | 35 | ## Deploy a language model with Azure AI Foundry
|
24 | 36 |
|
25 |
| -When you deploy a language model with Azure AI Foundry, you have several types available, which depend on the model you want to deploy: |
| 37 | +When you deploy a language model with Azure AI Foundry, you have several types available, which depend on the model you want to deploy. |
| 38 | + |
| 39 | +:::image type="content" source="../media/model-deployment.png" alt-text="Diagram of relationship between model types and deployment options."::: |
| 40 | + |
| 41 | +You can deploy: |
26 | 42 |
|
27 |
| -- Azure OpenAI service to deploy [Azure OpenAI models](/azure/ai-services/openai/concepts/models?azure-portal=true). |
28 |
| -- Azure AI model inference to deploy [Azure OpenAI models and Models as a Service](/azure/ai-foundry/model-inference/concepts/models?azure-portal=true). |
29 |
| -- Serverless APIs to deploy [Models as a Service](/azure/ai-foundry/how-to/model-catalog-overview#content-safety-for-models-deployed-via-serverless-apis?azure-portal=true). |
30 |
| -- Managed compute to deploy [open-source and custom models](/azure/ai-foundry/how-to/model-catalog-overview#availability-of-models-for-deployment-as-managed-compute?azure-portal=true). |
| 43 | +- [Azure OpenAI models](/azure/ai-services/openai/concepts/models?azure-portal=true) like GPT-3.5 and GPT-4 with Azure OpenAI service and Azure AI model inference. |
| 44 | +- Third-party models like DeepSeek-R1 as [Models as a Service](/azure/ai-foundry/model-inference/concepts/models?azure-portal=true) as part of Azure AI model inference or with [serverless APIs](/azure/ai-foundry/how-to/model-catalog-overview#content-safety-for-models-deployed-via-serverless-apis?azure-portal=true). |
| 45 | +- Open and custom models like models from Hugging Face with your own [user-managed compute](/azure/ai-foundry/how-to/model-catalog-overview#availability-of-models-for-deployment-as-managed-compute?azure-portal=true). |
31 | 46 |
|
32 |
| -The associated cost will depend on the type of model you deploy, which deployment option you choose, and what you are doing with the model: |
| 47 | +The associated cost depends on the type of model you deploy, which deployment option you choose, and what you are doing with the model: |
33 | 48 |
|
34 |
| -|Activity|Azure OpenAI models|Azure AI model inference|Models deployed as Serverless APIs (pay-as-you-go)|Models deployed with user-managed compute| |
| 49 | +|Activity|Azure OpenAI models|Azure AI model inference| Serverless APIs (pay-as-you-go)|Managed compute| |
35 | 50 | |---|---|---|---|---|
|
36 |
| -|Deploy the model|No, you aren't billed for deploying an Azure OpenAI model to your project.|No, you aren't billed for deploying an Azure OpenAI model to your project.|Yes, you're billed minimally per the infrastructure of the endpoint.|Yes, you're billed for the infrastructure hosting the model per minute.| |
37 |
| -|Call the endpoint|Yes, you're billed based on your token usage.|Yes, you're billed based on your token usage.|Yes, you're billed based on your token usage.|None.| |
| 51 | +|Deploy the model|No cost|No cost|Minimal endpoint cost|Charged per minute| |
| 52 | +|Call the endpoint|Token-based billing|Token-based billing|Token-based billing|No charge| |
0 commit comments