Skip to content

Commit a0014d5

Browse files
authored
Merge pull request #284506 from eric-urban/eur/deploy-models
restore deploy managed endpoint article
2 parents 8cf56a5 + 3ca3a5d commit a0014d5

File tree

3 files changed

+26
-121
lines changed

3 files changed

+26
-121
lines changed

articles/ai-studio/.openpublishing.redirection.ai-studio.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,11 @@
104104
"source_path_from_root": "/articles/ai-studio/whats-new.md",
105105
"redirect_url": "/azure/ai-studio/faq",
106106
"redirect_document_id": false
107+
},
108+
{
109+
"source_path_from_root": "/articles/ai-studio/how-to/deploy-models-open.md",
110+
"redirect_url": "/azure/ai-studio/how-to/deploy-models-managed",
111+
"redirect_document_id": false
107112
}
108113
]
109114
}

articles/ai-studio/how-to/deploy-models-open.md renamed to articles/ai-studio/how-to/deploy-models-managed.md

Lines changed: 18 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -1,131 +1,28 @@
11
---
2-
title: How to deploy open models with Azure AI Studio
2+
title: How to deploy and inference a managed compute deployment with code
33
titleSuffix: AI Studio
4-
description: Learn how to deploy open models with Azure AI Studio.
4+
description: Learn how to deploy and inference a managed compute deployment with code.
55
manager: scottpolly
66
ms.service: azure-ai-studio
77
ms.custom:
88
- build-2024
99
ms.topic: how-to
10-
ms.date: 5/21/2024
11-
ms.reviewer: fasantia
10+
ms.date: 8/13/2024
11+
ms.reviewer: fasantia
12+
reviewer: santiagxf
1213
ms.author: mopeakande
1314
author: msakande
1415
---
1516

16-
# How to deploy large language models with Azure AI Studio
17+
# How to deploy and inference a managed compute deployment with code
1718

18-
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
19+
The AI Studio [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models, and the most common way to deploy these models is to use the managed compute deployment option, which is also sometimes referred to as a managed online deployment.
1920

2021
Deployment of a large language model (LLM) makes it available for use in a website, an application, or other production environment. Deployment typically involves hosting the model on a server or in the cloud and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference of generative AI applications such as chat and copilot.
2122

22-
In this article, you learn how to deploy large language models in Azure AI Studio. You can deploy models from the model catalog or from your project. You can also deploy models using the Azure Machine Learning SDK. The article also covers how to perform inference on the deployed model.
23-
24-
## Deploy and inference a Serverless API model with code
25-
26-
### Deploying a model
27-
28-
Serverless API models are the models you can deploy with pay-as-you-go billing. Examples include Phi-3, Llama-2, Command R, Mistral Large, and more. For serverless API models, you're only charged for inferencing, unless you choose to fine-tune the model.
29-
30-
#### Get the model ID
31-
32-
You can deploy Serverless API models using the Azure Machine Learning SDK, but first, let's browse the model catalog and get the model ID you need for deployment.
33-
34-
1. Sign in to [AI Studio](https://ai.azure.com) and go to the **Home** page.
35-
1. Select **Model catalog** from the left sidebar.
36-
1. In the **Deployment options** filter, select **Serverless API**.
37-
38-
:::image type="content" source="../media/deploy-monitor/catalog-filter-serverless-api.png" alt-text="A screenshot showing how to filter by serverless API models in the catalog." lightbox="../media/deploy-monitor/catalog-filter-serverless-api.png":::
39-
40-
1. Select a model.
41-
1. Copy the model ID from the details page of the model you selected. It looks something like this: `azureml://registries/azureml-cohere/models/Cohere-command-r-plus/versions/3`
42-
43-
44-
#### Install the Azure Machine Learning SDK
45-
46-
Next, you need to install the Azure Machine Learning SDK. Run the following commands in your terminal:
47-
48-
```python
49-
pip install azure-ai-ml
50-
pip install azure-identity
51-
```
52-
53-
#### Deploy the serverless API model
54-
55-
First, you need to authenticate into Azure AI.
56-
57-
```python
58-
from azure.ai.ml import MLClient
59-
from azure.identity import DefaultAzureCredential
60-
from azure.ai.ml.entities import MarketplaceSubscription, ServerlessEndpoint
61-
62-
# You can find your credential information in project settings.
63-
client = MLClient(
64-
credential=DefaultAzureCredential(),
65-
subscription_id="your subscription name goes here",
66-
resource_group_name="your resource group name goes here",
67-
workspace_name="your project name goes here",
68-
)
69-
```
70-
Second, let's reference the model ID you found earlier.
71-
72-
```python
73-
# You can find the model ID on the model catalog.
74-
model_id="azureml://registries/azureml-meta/models/Llama-2-70b-chat/versions/18"
75-
```
76-
Serverless API models from third party model providers require an Azure Marketplace subscription in order to use the model. Let's create a marketplace subscription.
23+
In this article, you learn how to deploy models using the Azure Machine Learning SDK. The article also covers how to perform inference on the deployed model.
7724

78-
> [!NOTE]
79-
> You can skip the part if you are deploying a Serverless API model from Microsoft, such as Phi-3.
80-
81-
```python
82-
# You can customize the subscription name.
83-
subscription_name="Meta-Llama-2-70b-chat"
84-
85-
marketplace_subscription = MarketplaceSubscription(
86-
model_id=model_id,
87-
name=subscription_name,
88-
)
89-
90-
marketplace_subscription = client.marketplace_subscriptions.begin_create_or_update(
91-
marketplace_subscription
92-
).result()
93-
```
94-
Finally, let's create a serverless endpoint.
95-
96-
```python
97-
98-
endpoint_name="Meta-Llama-2-70b-chat-qwerty" # Your endpoint name must be unique
99-
100-
serverless_endpoint = ServerlessEndpoint(
101-
name=endpoint_name,
102-
model_id=model_id
103-
)
104-
105-
created_endpoint = client.serverless_endpoints.begin_create_or_update(
106-
serverless_endpoint
107-
).result()
108-
```
109-
110-
#### Get the Serverless API endpoint and keys
111-
112-
```python
113-
endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)
114-
print(endpoint_keys.primary_key)
115-
print(endpoint_keys.secondary_key)
116-
```
117-
118-
#### Inference the deployment
119-
120-
To inference, you want to use the code specifically catering to different model types and SDK you're using. You can find code samples at the [Azure/azureml-examples sample repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/foundation-models).
121-
122-
## Deploy and inference a managed compute deployment with code
123-
124-
### Deploying a model
125-
126-
The AI Studio [model catalog](../how-to/model-catalog-overview.md) offers over 1,600 models, and the most common way to deploy these models is to use the managed compute deployment option, which is also sometimes referred to as a managed online deployment.
127-
128-
#### Get the model ID
25+
## Get the model ID
12926

13027
You can deploy managed compute models using the Azure Machine Learning SDK, but first, let's browse the model catalog and get the model ID you need for deployment.
13128

@@ -138,18 +35,20 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
13835
1. Select a model.
13936
1. Copy the model ID from the details page of the model you selected. It looks something like this: `azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16`
14037

141-
#### Install the Azure Machine Learning SDK
14238

143-
For this step, you need to install the Azure Machine Learning SDK.
39+
40+
## Deploy the model
41+
42+
Let's deploy the model.
43+
44+
First, you need to install the Azure Machine Learning SDK.
14445

14546
```python
14647
pip install azure-ai-ml
14748
pip install azure-identity
14849
```
14950

150-
#### Deploy the model
151-
152-
First, you need to authenticate into Azure AI.
51+
Use this code to authenticate with Azure Machine Learning and create a client object. Replace the placeholders with your subscription ID, resource group name, and AI Studio project name.
15352

15453
```python
15554
from azure.ai.ml import MLClient
@@ -163,9 +62,7 @@ client = MLClient(
16362
)
16463
```
16564

166-
Let's deploy the model.
167-
168-
For Managed compute deployment option, you need to create an endpoint before a model deployment. Think of endpoint as a container that can house multiple model deployments. The endpoint names need to be unique in a region, so in this example we're using the timestamp to create a unique endpoint name.
65+
For the managed compute deployment option, you need to create an endpoint before a model deployment. Think of an endpoint as a container that can house multiple model deployments. The endpoint names need to be unique in a region, so in this example we're using the timestamp to create a unique endpoint name.
16966

17067
```python
17168
import time, sys
@@ -219,7 +116,7 @@ endpoint.traffic = {"demo": 100}
219116
workspace_ml_client.begin_create_or_update(endpoint).result()
220117
```
221118

222-
#### Inference the deployment
119+
## Inference the deployment
223120
You need a sample json data to test inferencing. Create `sample_score.json` with the following example.
224121

225122
```python

articles/ai-studio/toc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,9 @@ items:
140140
displayName: maas, paygo, models-as-a-service
141141
- name: Model and region availability for Serverless API deployments
142142
href: how-to/deploy-models-serverless-availability.md
143+
- name: Deploy and inference a managed compute deployment with code
144+
href: how-to/deploy-models-managed.md
145+
displayName: endpoint, online, SDK, CLI
143146
- name: Data for your generative AI app
144147
items:
145148
- name: Overview of retrieval augmented generation (RAG)

0 commit comments

Comments
 (0)