You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-safely-rollout-online-endpoints.md
+63-54Lines changed: 63 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -168,20 +168,45 @@ If you cloned the examples repo, your local machine already has copies of the fi
168
168
169
169
1. Go to [https://github.com/Azure/azureml-examples/](https://github.com/Azure/azureml-examples/).
170
170
1. Go to the **<> Code** button on the page, and then select **Download ZIP** from the **Local** tab.
171
-
1. Locate the folder `/cli/endpoints/online/model-1/model` and the file `/cli/endpoints/online/model-1/onlinescoring/score.py`.
171
+
1. Locate the model folder `/cli/endpoints/online/model-1/model` and scoring script `/cli/endpoints/online/model-1/onlinescoring/score.py` for a first model `model-1`.
172
+
1. Locate the model folder `/cli/endpoints/online/model-2/model` and scoring script `/cli/endpoints/online/model-2/onlinescoring/score.py` for a second model `model-2`.
172
173
173
174
---
174
175
175
176
## Define the endpoint and deployment
176
177
177
178
Online endpoints are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and send responses back in real time.
178
179
179
-
To define an endpoint, you need to specify:
180
+
### Define an endpoint
181
+
182
+
To define an endpoint, you need to specify the following key attributes:
180
183
181
184
* Endpoint name: The name of the endpoint. It must be unique in the Azure region. For more information on the naming rules, see [managed online endpoint limits](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints).
182
-
* Authentication mode: The authentication method for the endpoint. Choose between key-based authentication and Azure Machine Learning token-based authentication. A key doesn't expire, but a token does expire. For more information on authenticating, see [Authenticate to an online endpoint](how-to-authenticate-online-endpoint.md).
185
+
* Authentication mode: The authentication method for the endpoint. Choose between key-based authentication `key`and Azure Machine Learning token-based authentication`aml_token`. A key doesn't expire, but a token does expire. For more information on authenticating, see [Authenticate to an online endpoint](how-to-authenticate-online-endpoint.md).
183
186
* Optionally, you can add a description and tags to your endpoint.
184
187
188
+
### Define a deployment
189
+
190
+
A *deployment* is a set of resources required for hosting the model that does the actual inferencing. To deploy a model, you must have:
191
+
192
+
- Model files (or the name and version of a model that's already registered in your workspace). In the example, we have a scikit-learn model that does regression.
193
+
- A scoring script, that is, code that executes the model on a given input request. The scoring script receives data submitted to a deployed web service and passes it to the model. The script then executes the model and returns its response to the client. The scoring script is specific to your model and must understand the data that the model expects as input and returns as output. In this example, we have a *score.py* file.
194
+
- An environment in which your model runs. The environment can be a Docker image with Conda dependencies or a Dockerfile.
195
+
- Settings to specify the instance type and scaling capacity.
196
+
197
+
The following table describes the key attributes of a deployment:
| Endpoint name | The name of the endpoint to create the deployment under. |
203
+
| Model | The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification. |
204
+
| Code path | The path to the directory on the local development environment that contains all the Python source code for scoring the model. You can use nested directories and packages. |
205
+
| Scoring script | The relative path to the scoring file in the source code directory. This Python code must have an `init()` function and a `run()` function. The `init()` function will be called after the model is created or updated (you can use it to cache the model in memory, for example). The `run()` function is called at every invocation of the endpoint to do the actual scoring and prediction. |
206
+
| Environment | The environment to host the model and code. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. |
207
+
| Instance type | The VM size to use for the deployment. For the list of supported sizes, see [Managed online endpoints SKU list](reference-managed-online-endpoints-vm-sku-list.md). |
208
+
| Instance count | The number of instances to use for the deployment. Base the value on the workload you expect. For high availability, we recommend that you set the value to at least `3`. We reserve an extra 20% for performing upgrades. For more information, see [managed online endpoint quotas](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints). |
209
+
185
210
# [Azure CLI](#tab/azure-cli)
186
211
187
212
### Create online endpoint
@@ -217,7 +242,7 @@ To create an online endpoint:
217
242
218
243
### Create the 'blue' deployment
219
244
220
-
A deployment is a set of resources required for hosting the model that does the actual inferencing. In this article, you'll use the *endpoints/online/managed/sample/blue-deployment.yml* file to configure the key aspects of the deployment<!-- [link to "define the deployment" section in Deploy article] -->. The following snippet shows the contents of the file:
245
+
In this article, you'll use the *endpoints/online/managed/sample/blue-deployment.yml* file to configure the key aspects of the deployment. The following snippet shows the contents of the file:
@@ -238,14 +263,14 @@ For more information on registering your model as an asset, see [Register your m
238
263
239
264
### Create online endpoint
240
265
241
-
To create a managed online endpoint, use the `ManagedOnlineEndpoint` class. This class allows users to configure the following key aspects of the endpoint:
266
+
To create a managed online endpoint, use the `ManagedOnlineEndpoint` class. This class allows users to configure the key aspects of the endpoint.
242
267
243
-
*`name` - Name of the endpoint. Needs to be unique at the Azure region level
268
+
<!--* `name` - Name of the endpoint. Needs to be unique at the Azure region level
244
269
* `auth_mode` - The authentication method for the endpoint. Key-based authentication and Azure Machine Learning token-based authentication are supported. Key-based authentication doesn't expire but Azure Machine Learning token-based authentication does. Possible values are `key` or `aml_token`.
245
270
* `identity`- The managed identity configuration for accessing Azure resources for endpoint provisioning and inference.
246
271
* `type`- The type of managed identity. Azure Machine Learning supports `system_assigned` or `user_assigned` identity.
247
272
* `user_assigned_identities` - List (array) of fully qualified resource IDs of the user-assigned identities. This property is required if `identity.type` is user_assigned.
248
-
*`description`- Description of the endpoint.
273
+
* `description`- Description of the endpoint.-->
249
274
250
275
1. Configure the endpoint:
251
276
@@ -260,7 +285,8 @@ To create a managed online endpoint, use the `ManagedOnlineEndpoint` class. This
260
285
261
286
### Create the 'blue' deployment
262
287
263
-
A deployment is a set of resources required for hosting the model that does the actual inferencing. To create a deployment for your managed online endpoint, use the `ManagedOnlineDeployment` class. This class allows users to configure the key aspects of the deployment. <!-- [link to "define the deployment" section in Deploy article] -->
288
+
To create a deployment for your managed online endpoint, use the `ManagedOnlineDeployment` class. This class allows users to configure the key aspects of the deployment.
289
+
The following table describes the attributes of a `deployment`:
264
290
265
291
1. Configure blue deployment:
266
292
@@ -318,7 +344,7 @@ One way to create a managed online endpoint in the studio is from the **Models**
318
344
1. Go to the [Azure Machine Learning studio](https://ml.azure.com).
319
345
1. In the left navigation bar, select the **Models** page.
320
346
1. Select the model named `model-1` by checking the circle next to its name.
321
-
1. Select **Deploy** > **Deploy to real-time endpoint**.
347
+
1. Select **Deploy** > **Real-time endpoint**.
322
348
323
349
:::image type="content" source="media/how-to-safely-rollout-managed-endpoints/deploy-from-models-page.png" lightbox="media/how-to-safely-rollout-managed-endpoints/deploy-from-models-page.png" alt-text="A screenshot of creating a managed online endpoint from the Models UI.":::
324
350
@@ -327,12 +353,6 @@ One way to create a managed online endpoint in the studio is from the **Models**
327
353
:::image type="content" source="media/how-to-safely-rollout-managed-endpoints/online-endpoint-wizard.png" lightbox="media/how-to-safely-rollout-managed-endpoints/online-endpoint-wizard.png" alt-text="A screenshot of a managed online endpoint create wizard.":::
328
354
329
355
1. Enter an __Endpoint name__.
330
-
331
-
> [!NOTE]
332
-
> * Endpoint name: The name of the endpoint. It must be unique in the Azure region. For more information on the naming rules, see [managed online endpoint limits](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints).
333
-
> * Authentication type: The authentication method for the endpoint. Choose between key-based authentication and Azure Machine Learning token-based authentication. A `key` doesn't expire, but a token does expire. For more information on authenticating, see [Authenticate to an online endpoint](how-to-authenticate-online-endpoint.md).
334
-
> * Optionally, you can add a description and tags to your endpoint.
335
-
336
356
1. Keep the default selections: __Managed__ for the compute type and __key-based authentication__ for the authentication type.
337
357
1. Select __Next__, until you get to the "Deployment" page. Here, perform the following tasks:
338
358
@@ -548,11 +568,23 @@ Though `green` has 0% of traffic allocated, you can still invoke the endpoint an
548
568
549
569
Once you've tested your `green` deployment, you can 'mirror' (or copy) a percentage of the live traffic to it. Mirroring traffic (also called shadowing) doesn't change the results returned to clients. Requests still flow 100% to the `blue` deployment. The mirrored percentage of the traffic is copied and submitted to the `green` deployment so you can gather metrics and logging without impacting your clients. Mirroring is useful when you want to validate a new deployment without impacting clients. For example, you can use mirroring to check if latency is within acceptable bounds or to check that there are no HTTP errors. Testing the new deployment with traffic mirroring/shadowing is also known as [shadow testing](https://microsoft.github.io/code-with-engineering-playbook/automated-testing/shadow-testing/). The deployment receiving the mirrored traffic (in this case, the `green` deployment) can also be called the shadow deployment.
550
570
551
-
> [!WARNING]
552
-
> Mirroring traffic uses your [endpoint bandwidth quota](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints) (default 5 MBPS). Your endpoint bandwidth will be throttled if you exceed the allocated quota. For information on monitoring bandwidth throttling, see [Monitor managed online endpoints](how-to-monitor-online-endpoints.md#metrics-at-endpoint-scope).
571
+
Mirroring has the following limitations:
572
+
* Mirrored traffic is supported for the CLI (v2) (version 2.4.0 or above) and Python SDK (v2) (version 1.0.0 or above). If you update the endpoint using an older version of CLI/SDK or Studio UI, the setting for mirrored traffic will be removed.
573
+
* Mirrored traffic isn't currently supported for Kubernetes online endpoints.
574
+
* You can mirror traffic to only one deployment.
575
+
* The maximum mirrored traffic you can configure is 50%. This limit is to reduce the effect on your [endpoint bandwidth quota](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints) (default 5 MBPS). Your endpoint bandwidth will be throttled if you exceed the allocated quota. For information on monitoring bandwidth throttling, see [Monitor managed online endpoints](how-to-monitor-online-endpoints.md#metrics-at-endpoint-scope)..
553
576
554
-
> [!IMPORTANT]
555
-
> Mirrored traffic is supported for the CLI (v2) (version 2.4.0 or above) and Python SDK (v2) (version 1.0.0 or above). If you update the endpoint using an older version of CLI/SDK or Studio UI, the setting for mirrored traffic will be removed.
577
+
Also note the following behavior:
578
+
* A deployment can only be set to live or mirrored traffic, not both.
579
+
* You can send traffic directly to the mirror deployment by specifying the deployment set for mirror traffic.
580
+
* You can send traffic directly to a live deployment by specifying the deployment set for live traffic, but in this case the traffic won't be mirrored to the mirror deployment. Mirror traffic is routed from traffic sent to the endpoint without specifying the deployment.
581
+
582
+
> [!TIP]
583
+
> You can use `--deployment-name` option [for CLI v2](/cli/azure/ml/online-endpoint#az-ml-online-endpoint-invoke-optional-parameters), or `deployment_name` option [for SDK v2](/python/api/azure-ai-ml/azure.ai.ml.operations.onlineendpointoperations#azure-ai-ml-operations-onlineendpointoperations-invoke) to specify the deployment to be routed to.
584
+
585
+
Now, let's set the green deployment to receive 10% of mirrored traffic. Clients will still receive predictions from the blue deployment only.
586
+
587
+
:::image type="content" source="./media/how-to-safely-rollout-managed-endpoints/endpoint-concept-mirror.png" alt-text="Diagram showing 10% traffic mirrored to one deployment.":::
556
588
557
589
# [Azure CLI](#tab/azure-cli)
558
590
@@ -568,37 +600,6 @@ for i in {1..20} ; do
568
600
done
569
601
```
570
602
571
-
# [Python](#tab/python)
572
-
573
-
The following command mirrors 10% of the traffic to the `green` deployment:
The studio doesn't support mirrored traffic. See the Azure CLI or Python tabs for steps to mirror traffic to a deployment.
583
-
584
-
---
585
-
586
-
Mirroring has the following limitations:
587
-
* You can only mirror traffic to one deployment.
588
-
* Mirror traffic isn't currently supported for Kubernetes online endpoints.
589
-
* The maximum mirrored traffic you can configure is 50%. This limit is to reduce the effect on your endpoint bandwidth quota.
590
-
591
-
Also note the following behavior:
592
-
* A deployment can only be set to live or mirror traffic, not both.
593
-
* You can send traffic directly to the mirror deployment by specifying the deployment set for mirror traffic.
594
-
* You can send traffic directly to a live deployment by specifying the deployment set for live traffic, but in this case the traffic won't be mirrored to the mirror deployment. Mirror traffic is routed from traffic sent to endpoint without specifying the deployment.
595
-
596
-
> [!TIP]
597
-
> You can use `--deployment-name` option [for CLI v2](/cli/azure/ml/online-endpoint#az-ml-online-endpoint-invoke-optional-parameters), or `deployment_name` option [for SDK v2](/python/api/azure-ai-ml/azure.ai.ml.operations.onlineendpointoperations#azure-ai-ml-operations-onlineendpointoperations-invoke) to specify the deployment to be routed to.
598
-
599
-
:::image type="content" source="./media/how-to-safely-rollout-managed-endpoints/endpoint-concept-mirror.png" alt-text="Diagram showing 10% traffic mirrored to one deployment.":::
600
-
601
-
# [Azure CLI](#tab/azure-cli)
602
603
You can confirm that the specific percentage of the traffic was sent to the `green` deployment by seeing the logs from the deployment:
603
604
604
605
```azurecli
@@ -610,6 +611,14 @@ After testing, you can set the mirror traffic to zero to disable mirroring:
0 commit comments