Skip to content

Commit dda95b6

Browse files
authored
Merge pull request #3703 from msakande/freshness-deploy-managed-compute
freshness- deploy-models-managed.md
2 parents 0690837 + 7a2725c commit dda95b6

File tree

1 file changed

+110
-112
lines changed

1 file changed

+110
-112
lines changed

articles/ai-foundry/how-to/deploy-models-managed.md

Lines changed: 110 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.service: azure-ai-foundry
77
ms.custom:
88
- build-2024
99
ms.topic: how-to
10-
ms.date: 8/13/2024
10+
ms.date: 03/24/2025
1111
ms.reviewer: fasantia
1212
reviewer: santiagxf
1313
ms.author: mopeakande
@@ -39,121 +39,119 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
3939

4040
## Deploy the model
4141

42-
Let's deploy the model.
43-
44-
First, you need to install the Azure Machine Learning SDK.
45-
46-
```python
47-
pip install azure-ai-ml
48-
pip install azure-identity
49-
```
50-
51-
Use this code to authenticate with Azure Machine Learning and create a client object. Replace the placeholders with your subscription ID, resource group name, and Azure AI Foundry project name.
52-
53-
```python
54-
from azure.ai.ml import MLClient
55-
from azure.identity import InteractiveBrowserCredential
56-
57-
client = MLClient(
58-
credential=InteractiveBrowserCredential,
59-
subscription_id="your subscription name goes here",
60-
resource_group_name="your resource group name goes here",
61-
workspace_name="your project name goes here",
62-
)
63-
```
64-
65-
For the managed compute deployment option, you need to create an endpoint before a model deployment. Think of an endpoint as a container that can house multiple model deployments. The endpoint names need to be unique in a region, so in this example we're using the timestamp to create a unique endpoint name.
66-
67-
```python
68-
import time, sys
69-
from azure.ai.ml.entities import (
70-
ManagedOnlineEndpoint,
71-
ManagedOnlineDeployment,
72-
ProbeSettings,
73-
)
74-
75-
# Make the endpoint name unique
76-
timestamp = int(time.time())
77-
online_endpoint_name = "customize your endpoint name here" + str(timestamp)
78-
79-
# Create an online endpoint
80-
endpoint = ManagedOnlineEndpoint(
81-
name=online_endpoint_name,
82-
auth_mode="key",
83-
)
84-
workspace_ml_client.begin_create_or_update(endpoint).wait()
85-
```
86-
87-
88-
Create a deployment. You can find the model ID in the model catalog.
89-
90-
```python
91-
model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16"
92-
93-
demo_deployment = ManagedOnlineDeployment(
94-
name="demo",
95-
endpoint_name=online_endpoint_name,
96-
model=model_name,
97-
instance_type="Standard_DS3_v2",
98-
instance_count=2,
99-
liveness_probe=ProbeSettings(
100-
failure_threshold=30,
101-
success_threshold=1,
102-
timeout=2,
103-
period=10,
104-
initial_delay=1000,
105-
),
106-
readiness_probe=ProbeSettings(
107-
failure_threshold=10,
108-
success_threshold=1,
109-
timeout=10,
110-
period=10,
111-
initial_delay=1000,
112-
),
113-
)
114-
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
115-
endpoint.traffic = {"demo": 100}
116-
workspace_ml_client.begin_create_or_update(endpoint).result()
117-
```
42+
1. Install the Azure Machine Learning SDK.
43+
44+
```python
45+
pip install azure-ai-ml
46+
pip install azure-identity
47+
```
48+
49+
1. Authenticate with Azure Machine Learning and create a client object. Replace the placeholders with your subscription ID, resource group name, and Azure AI Foundry project name.
50+
51+
```python
52+
from azure.ai.ml import MLClient
53+
from azure.identity import InteractiveBrowserCredential
54+
55+
workspace_ml_client = MLClient(
56+
credential=InteractiveBrowserCredential,
57+
subscription_id="your subscription name goes here",
58+
resource_group_name="your resource group name goes here",
59+
workspace_name="your project name goes here",
60+
)
61+
```
62+
63+
1. Create an endpoint. For the managed compute deployment option, you need to create an endpoint before a model deployment. Think of an endpoint as a container that can house multiple model deployments. The endpoint names need to be unique in a region, so in this example use the timestamp to create a unique endpoint name.
64+
65+
```python
66+
import time, sys
67+
from azure.ai.ml.entities import (
68+
ManagedOnlineEndpoint,
69+
ManagedOnlineDeployment,
70+
ProbeSettings,
71+
)
72+
73+
# Make the endpoint name unique
74+
timestamp = int(time.time())
75+
online_endpoint_name = "customize your endpoint name here" + str(timestamp)
76+
77+
# Create an online endpoint
78+
endpoint = ManagedOnlineEndpoint(
79+
name=online_endpoint_name,
80+
auth_mode="key",
81+
)
82+
workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).wait()
83+
```
84+
85+
1. Create a deployment. Replace the model ID in the next code with the model ID that you copied from the details page of the model you selected in the [Get the model ID](#get-the-model-id) section.
86+
87+
```python
88+
model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16"
89+
90+
demo_deployment = ManagedOnlineDeployment(
91+
name="demo",
92+
endpoint_name=online_endpoint_name,
93+
model=model_name,
94+
instance_type="Standard_DS3_v2",
95+
instance_count=2,
96+
liveness_probe=ProbeSettings(
97+
failure_threshold=30,
98+
success_threshold=1,
99+
timeout=2,
100+
period=10,
101+
initial_delay=1000,
102+
),
103+
readiness_probe=ProbeSettings(
104+
failure_threshold=10,
105+
success_threshold=1,
106+
timeout=10,
107+
period=10,
108+
initial_delay=1000,
109+
),
110+
)
111+
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
112+
endpoint.traffic = {"demo": 100}
113+
workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).result()
114+
```
118115

119116
## Inference the deployment
120-
You need a sample json data to test inferencing. Create `sample_score.json` with the following example.
121-
122-
```python
123-
{
124-
"inputs": {
125-
"question": [
126-
"Where do I live?",
127-
"Where do I live?",
128-
"What's my name?",
129-
"Which name is also used to describe the Amazon rainforest in English?"
130-
],
131-
"context": [
132-
"My name is Wolfgang and I live in Berlin",
133-
"My name is Sarah and I live in London",
134-
"My name is Clara and I live in Berkeley.",
135-
"The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
136-
]
137-
}
138-
}
139-
```
140-
141-
Let's inference with `sample_score.json`. Change the location based on where you saved your sample json file.
142-
143-
```python
144-
scoring_file = "./sample_score.json"
145-
response = workspace_ml_client.online_endpoints.invoke(
146-
endpoint_name=online_endpoint_name,
147-
deployment_name="demo",
148-
request_file=scoring_file,
149-
)
150-
response_json = json.loads(response)
151-
print(json.dumps(response_json, indent=2))
152-
```
117+
118+
1. You need a sample json data to test inferencing. Create `sample_score.json` with the following example.
119+
120+
```python
121+
{
122+
"inputs": {
123+
"question": [
124+
"Where do I live?",
125+
"Where do I live?",
126+
"What's my name?",
127+
"Which name is also used to describe the Amazon rainforest in English?"
128+
],
129+
"context": [
130+
"My name is Wolfgang and I live in Berlin",
131+
"My name is Sarah and I live in London",
132+
"My name is Clara and I live in Berkeley.",
133+
"The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
134+
]
135+
}
136+
}
137+
```
138+
139+
1. Inference with `sample_score.json`. Change the location of the scoring file in the next code, based on where you saved your sample json file.
140+
141+
```python
142+
scoring_file = "./sample_score.json"
143+
response = workspace_ml_client.online_endpoints.invoke(
144+
endpoint_name=online_endpoint_name,
145+
deployment_name="demo",
146+
request_file=scoring_file,
147+
)
148+
response_json = json.loads(response)
149+
print(json.dumps(response_json, indent=2))
150+
```
153151

154152
## Configure Autoscaling
155153

156-
To configure autoscaling for deployments, you can go to Azure Portal, locate the Azure resource typed `Machine learning online deployment` in the resource group of the AI project, and use Scaling menu under Setting. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation.
154+
To configure autoscaling for deployments, you can go to Azure portal, locate the Azure resource typed `Machine learning online deployment` in the resource group of the AI project, and use Scaling menu under Setting. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation.
157155

158156
## Delete the deployment endpoint
159157

@@ -163,7 +161,7 @@ To delete deployments in Azure AI Foundry portal, select the **Delete** button o
163161

164162
To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request for a quota increase.
165163

166-
## Next steps
164+
## Related content
167165

168166
- Learn more about what you can do in [Azure AI Foundry](../what-is-ai-foundry.md)
169167
- Get answers to frequently asked questions in the [Azure AI FAQ article](../faq.yml)

0 commit comments

Comments
 (0)