Skip to content

Commit d535d6f

Browse files
authored
Merge pull request #274680 from santiagxf/santiagxf/release-build-2024-azureml
Serverless endpoints for Azure Machine Learning
2 parents 2dddfff + 8c17c18 commit d535d6f

20 files changed

+2597
-274
lines changed
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
title: Region availability for models in Serverless API endpoints
3+
titleSuffix: Azure Machine Learning
4+
description: Learn about the regions where each model is available for deployment in serverless API endpoints.
5+
manager: scottpolly
6+
ms.service: machine-learning
7+
ms.subservice: inferencing
8+
ms.topic: how-to
9+
ms.date: 05/09/2024
10+
ms.reviewer: mopeakande
11+
reviewer: msakande
12+
ms.author: fasantia
13+
author: santiagxf
14+
ms.custom:
15+
- build-2024
16+
- serverless
17+
- references_regions
18+
---
19+
20+
# Region availability for models in Serverless API endpoints
21+
22+
In this article, you learn about which regions are available for each of the models supporting serverless API endpoint deployments.
23+
24+
Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.
25+
26+
## Region availability
27+
28+
Serverless API endpoints are available in the following regions for the indicated models:
29+
30+
| Model | East US 2 | West US 3 | Sweden Central | France Central |
31+
| ------------------ | ----------- | -------------- | -------------- | -------------- |
32+
| Mistral-Small | **✓** | | **✓** | |
33+
| Mistral-Large | **✓** | | **✓** | **✓** |
34+
| Cohere Command R | **✓** | | **✓** | |
35+
| Cohere Command R+ | **✓** | | **✓** | |
36+
| Cohere Embed v3 | **✓** | | **✓** | |
37+
| Meta Llama 2 | **✓** | **✓** | | |
38+
| Meta Llama 3 | **✓** | | | |
39+
| Phi-3 | **✓** | | **✓** | |
40+
41+
> [!NOTE]
42+
> Models offered through the Azure Marketplace are available for purchase only on [Microsoft Managed Countries](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions), with exception of Cohere family of models, which is also available in Japan.
43+
44+
## Alternatives to region availability
45+
46+
If most of your infrastructure is in a particular region and you want to take advantage of models available only as serverless API endpoints, you can create a workspace on the supported region and then consume the endpoint from another region.
47+
48+
Read [Consume serverless API endpoints from a different workspace](how-to-connect-models-serverless.md) to learn how to configure an existing serverless API endpoint in a different workspace than the one where it was deployed.
49+
50+
## Related content
51+
52+
- [Model Catalog and Collections](concept-model-catalog.md)
53+
- [Deploy models as serverless API endpoints](how-to-deploy-models-serverless.md)
54+
55+

articles/machine-learning/concept-endpoints.md

Lines changed: 66 additions & 47 deletions
Large diffs are not rendered by default.
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
title: Consume deployed serverless API endpoints from a different workspace
3+
titleSuffix: Azure Machine Learning
4+
description: Learn how to consume a serverless API endpoint from a different workspace than the one where it was deployed.
5+
manager: scottpolly
6+
ms.service: machine-learning
7+
ms.subservice: inferencing
8+
ms.topic: how-to
9+
ms.date: 05/09/2024
10+
ms.reviewer: mopeakande
11+
reviewer: msakande
12+
ms.author: fasantia
13+
author: santiagxf
14+
ms.custom:
15+
- build-2024
16+
- serverless
17+
---
18+
19+
# Consume serverless API endpoints from a different workspace
20+
21+
In this article, you learn how to configure an existing serverless API endpoint in a different workspace than the one where it was deployed.
22+
23+
Certain models in the model catalog can be deployed as serverless APIs. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.
24+
25+
The need to consume a serverless API endpoint in a different workspace than the one that was used to create the deployment might arise in situations such as these:
26+
27+
- You want to centralize your deployments in a given workspace and consume them from different workspaces in your organization.
28+
- You need to deploy a model in a workspace in a particular Azure region where serverless deployment for that model is available. However, you need to consume it from another region, where serverless deployment isn't available for the particular models.
29+
30+
## Prerequisites
31+
32+
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
33+
34+
- An [Azure Machine Learning workspace](quickstart-create-resources.md) where you want to consume the existing deployment.
35+
36+
- A model [deployed to a serverless API endpoint](how-to-deploy-models-serverless.md). This article assumes that you previously deployed the **Meta-Llama-3-8B-Instruct** model. To learn how to deploy this model as a serverless API, see [Deploy models as serverless APIs](how-to-deploy-models-serverless.md).
37+
38+
- You need to install the following software to work with Azure Machine Learning:
39+
40+
# [Studio](#tab/azure-studio)
41+
42+
You can use any compatible web browser to navigate [Azure Machine Learning studio](https://ml.azure.com).
43+
44+
# [Azure CLI](#tab/cli)
45+
46+
The [Azure CLI](/cli/azure/) and the [ml extension for Azure Machine Learning](how-to-configure-cli.md).
47+
48+
```azurecli
49+
az extension add -n ml
50+
```
51+
52+
If you already have the extension installed, ensure the latest version is installed.
53+
54+
```azurecli
55+
az extension update -n ml
56+
```
57+
58+
Once the extension is installed, configure it:
59+
60+
```azurecli
61+
az account set --subscription <subscription>
62+
az configure --defaults workspace=<workspace-name> group=<resource-group> location=<location>
63+
```
64+
65+
# [Python SDK](#tab/python)
66+
67+
Install the [Azure Machine Learning SDK for Python](https://aka.ms/sdk-v2-install).
68+
69+
```python
70+
pip install -U azure-ai-ml
71+
```
72+
73+
Once installed, import necessary namespaces:
74+
75+
```python
76+
from azure.ai.ml import MLClient
77+
from azure.identity import InteractiveBrowserCredential
78+
from azure.ai.ml.entities import ServerlessEndpoint, ServerlessConnection
79+
```
80+
81+
## Create a serverless API endpoint connection
82+
83+
Follow these steps to create a connection:
84+
85+
1. Connect to the workspace where the endpoint is deployed:
86+
87+
# [Studio](#tab/azure-studio)
88+
89+
Go to [Azure Machine Learning studio](https://ml.azure.com) and navigate to the workspace where the endpoint you want to connect to is deployed.
90+
91+
# [Azure CLI](#tab/cli)
92+
93+
Configure the CLI to point to the workspace:
94+
95+
```azurecli
96+
az account set --subscription <subscription>
97+
az configure --defaults workspace=<workspace-name> group=<resource-group> location=<location>
98+
```
99+
100+
# [Python SDK](#tab/python)
101+
102+
Create a client connected to your workspace:
103+
104+
```python
105+
client = MLClient(
106+
credential=InteractiveBrowserCredential(tenant_id="<tenant-id>"),
107+
subscription_id="<subscription-id>",
108+
resource_group_name="<resource-group>",
109+
workspace_name="<workspace-name>",
110+
)
111+
```
112+
113+
1. Get the endpoint's URL and credentials for the endpoint you want to connect to. In this example, you get the details for an endpoint name **meta-llama3-8b-qwerty**.
114+
115+
# [Studio](#tab/azure-studio)
116+
117+
1. Select **Endpoints** from the left sidebar.
118+
119+
1. Select the **Serverless endpoints** tab to display the serverless API endpoints.
120+
121+
1. Select the endpoint you want to connect to.
122+
123+
1. On the endpoint's **Details** tab, copy the values for **Target URI** and **Key**.
124+
125+
# [Azure CLI](#tab/cli)
126+
127+
```azurecli
128+
az ml serverless-endpoint get-credentials -n meta-llama3-8b-qwerty
129+
```
130+
131+
# [Python SDK](#tab/python)
132+
133+
```python
134+
endpoint_name = "meta-llama3-8b-qwerty"
135+
endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)
136+
print(endpoint_keys.primary_key)
137+
print(endpoint_keys.secondary_key)
138+
```
139+
140+
1. Now, connect to the workspace *where you want to create the connection and consume the endpoint*.
141+
142+
1. Create the connection in the workspace:
143+
144+
# [Studio](#tab/azure-studio)
145+
146+
1. Go to the workspace where the connection needs to be created to.
147+
148+
1. Go to the **Manage** section in the left navigation bar and select **Connections**.
149+
150+
1. Select **Create**.
151+
152+
1. Select **Serverless Model**.
153+
154+
1. For the **Target URI**, paste the value you copied previously.
155+
156+
1. For the **Key**, paste the value you copied previously.
157+
158+
1. Give the connection a name, in this case **meta-llama3-8b-connection**.
159+
160+
1. Select **Add connection**.
161+
162+
# [Azure CLI](#tab/cli)
163+
164+
Create a connection definition:
165+
166+
__connection.yml__
167+
168+
```yml
169+
name: meta-llama3-8b-connection
170+
type: serverless
171+
endpoint: https://meta-llama3-8b-qwerty-serverless.inference.ai.azure.com
172+
api_key: 1234567890qwertyuiop
173+
```
174+
175+
```azurecli
176+
az ml connection create -f connection.yml
177+
```
178+
179+
# [Python SDK](#tab/python)
180+
181+
```python
182+
client.connections.create(ServerlessConnection(
183+
name="meta-llama3-8b-connection",
184+
endpoint="https://meta-llama3-8b-qwerty-serverless.inference.ai.azure.com",
185+
api_key="1234567890qwertyuiop"
186+
))
187+
```
188+
189+
1. At this point, the connection is available for consumption.
190+
191+
1. To validate that the connection is working:
192+
193+
1. From the left navigation bar of Azure Machine Learning studio, go to **Authoring** > **Prompt flow**.
194+
195+
1. Select **Create** to create a new flow.
196+
197+
1. Select **Create** in the **Chat flow** box.
198+
199+
1. Give your *Prompt flow* a name and select **Create**.
200+
201+
1. Select the **chat** node from the graph to go to the _chat_ section.
202+
203+
1. For **Connection**, open the dropdown list to select the connection you just created, in this case **meta-llama3-8b-connection**.
204+
205+
1. Select **Start compute session** from the top navigation bar, to start a prompt flow automatic runtime.
206+
207+
1. Select the **Chat** option. You can now send messages and get responses.
208+
209+
## Related content
210+
211+
- [Model Catalog and Collections](concept-model-catalog.md)
212+
- [Deploy models as serverless API endpoints](how-to-deploy-models-serverless.md)

0 commit comments

Comments
 (0)