You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-endpoints-online.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Online endpoints for real-time inference
3
3
titleSuffix: Azure Machine Learning
4
-
description: Learn about online endpoints for real-time inferencing in Azure Machine Learning.
4
+
description: Learn about online endpoints for real-time inferencing in Azure Machine Learning, including managed online endpoints.
5
5
services: machine-learning
6
6
ms.service: azure-machine-learning
7
7
ms.subservice: inferencing
@@ -21,7 +21,6 @@ ms.date: 09/23/2024
21
21
22
22
This article describes online endpoints for real-time inferencing in Azure Machine Learning. Inferencing is the process of applying new input data to a machine learning model to generate outputs. Azure Machine Learning allows you to perform real-time inferencing on data by using models that are deployed to *online endpoints*. While these outputs are typically called *predictions*, you can use inferencing to generate outputs for other machine learning tasks, such as classification and clustering.
23
23
24
-
<aname="online-endpoints"></a>
25
24
Online endpoints deploy models to a web server that can return predictions under the HTTP protocol. Online endpoints can operationalize models for real-time inference in synchronous, low-latency requests, and are best used when:
26
25
27
26
- You have low-latency requirements.
@@ -207,9 +206,9 @@ To deploy locally, you need the [Docker Engine](https://docs.docker.com/engine/i
207
206
208
207
Local debugging typically involves the following steps:
209
208
210
-
1. Check that the local deployment succeeded.
211
-
1. Invoke the local endpoint for inferencing.
212
-
1. Review the output logs for the `invoke` operation.
209
+
- First, check that the local deployment succeeded.
210
+
- Next, invoke the local endpoint for inferencing.
211
+
- Finally, review the output logs for the `invoke` operation.
213
212
214
213
Local endpoints have the following limitations:
215
214
- No support for traffic rules, authentication, or probe settings.
#Customer intent: As a data scientist, I want to figure out why my online endpoint deployment failed so that I can fix it.
@@ -26,7 +26,7 @@ The document structure reflects the way you should approach troubleshooting:
26
26
1. Use [container logs](#get-container-logs) to help debug issues.
27
27
1. Understand [common deployment errors](#common-deployment-errors) that might arise and how to fix them.
28
28
29
-
The [HTTP status codes](#http-status-codes)sections explain how invocation and prediction errors map to HTTP status codes when you score endpoints with REST requests.
29
+
The [HTTP status codes](#http-status-codes)section explains how invocation and prediction errors map to HTTP status codes when you score endpoints with REST requests.
30
30
31
31
## Prerequisites
32
32
@@ -40,6 +40,10 @@ The [HTTP status codes](#http-status-codes) sections explain how invocation and
40
40
41
41
- The Azure Machine Learning Python SDK v2. [Install the Azure Machine Learning SDK v2 for Python](/python/api/overview/azure/ai-ml-readme).
42
42
43
+
### [Studio](#tab/studio)
44
+
45
+
- An Azure Machine Learning workspace.
46
+
43
47
---
44
48
45
49
## Request tracing
@@ -75,6 +79,10 @@ For local deployment, use the `local=True` parameter. In this command,`ml_clien
For more information about debugging with container logs, see [Get container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs).
105
112
106
113
> [!NOTE]
107
114
> If you use Python logging, make sure to use the correct logging level, such as `INFO`, for the messages to be published to logs.
108
115
109
-
### See log output in Azure Machine Learning studio
110
-
111
-
To view log output from a container in Azure Machine Learning studio:
112
-
113
-
1. Select **Endpoints** in the left navigation bar.
114
-
1. Select an endpoint name to view the endpoint details page.
115
-
1. Select the **Logs** tab in the endpoint details page.
116
-
1. Select the deployment log you want to see from the dropdown menu.
117
-
118
-
:::image type="content" source="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" lightbox="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" alt-text="A screenshot of observing deployment logs in the studio.":::
119
-
120
-
The logs are pulled from the inference server. To get logs from the storage initializer container, use the Azure CLI or Python SDK commands.
To view log output from a container in Azure Machine Learning studio:
163
+
164
+
1. Select **Endpoints** in the left navigation bar.
165
+
1. Select an endpoint name to view the endpoint details page.
166
+
1. Select the **Logs** tab in the endpoint details page.
167
+
1. Select the deployment log you want to see from the dropdown menu.
168
+
169
+
:::image type="content" source="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" lightbox="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" alt-text="A screenshot of observing deployment logs in the studio.":::
170
+
171
+
The logs are pulled from the inference server. To get logs from the storage initializer container, use the Azure CLI or Python SDK commands.
172
+
166
173
---
167
174
168
175
## Common deployment errors
@@ -213,7 +220,7 @@ If you're creating or updating a Kubernetes online deployment, also see [Common
213
220
214
221
### ERROR: ImageBuildFailure
215
222
216
-
This error returns when the Docker image environment is being built. You can check the build log for more information on the failure. The build log is located in the default storage for your Azure Machine Learning workspace.
223
+
This error is returned when the Docker image environment is being built. You can check the build log for more information on the failure. The build log is located in the default storage for your Azure Machine Learning workspace.
217
224
218
225
The exact location might be returned as part of the error, for example `"the build log under the storage account '[storage-account-name]' in the container '[container-name]' at the path '[path-to-the-log]'"`.
219
226
@@ -259,7 +266,7 @@ The following resources might run out of quota when using Azure services:
259
266
-[Region-wide VM capacity](#region-wide-vm-capacity)
260
267
-[Other](#other-quota)
261
268
262
-
For Kubernetes online endpoints only, the [Kubernetes](#kubernetes-quota) resource might also run out of quota.
269
+
For Kubernetes online endpoints only, the [Kubernetes resource](#kubernetes-quota) might also run out of quota.
263
270
264
271
#### CPU quota
265
272
@@ -269,14 +276,14 @@ You can check if there are unused deployments you can delete, or you can [submit
269
276
270
277
#### Cluster quota
271
278
272
-
The OutOfQuota error occurs when you don't have enough Azure Machine Learning compute cluster quota. The quota defines the total number of clusters per subscription that you can use at the same time to deploy CPU or GPU nodes in the Azure cloud.
279
+
The `OutOfQuota` error occurs when you don't have enough Azure Machine Learning compute cluster quota. The quota defines the total number of clusters per subscription that you can use at the same time to deploy CPU or GPU nodes in the Azure cloud.
273
280
274
281
#### Disk quota
275
282
276
-
The OutOfQuota error occurs when the size of the model is larger than the available disk space and the model can't be downloaded. Try using a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more disk space or reducing the image and model size.
283
+
The `OutOfQuota` error occurs when the size of the model is larger than the available disk space and the model can't be downloaded. Try using a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more disk space or reducing the image and model size.
277
284
278
285
#### Memory quota
279
-
The OutOfQuota error occurs when the memory footprint of the model is larger than the available memory. Try a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more memory.
286
+
The `OutOfQuota` error occurs when the memory footprint of the model is larger than the available memory. Try a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more memory.
280
287
281
288
#### Role assignment quota
282
289
@@ -288,7 +295,7 @@ Try to delete some unused endpoints in this subscription. If all your endpoints
288
295
289
296
#### Kubernetes quota
290
297
291
-
The OutOfQuota error occurs when the requested CPU or memory can't be provided due to nodes being unschedulable for this deployment. For example, nodes might be cordoned or otherwise unavailable.
298
+
The `OutOfQuota` error occurs when the requested CPU or memory can't be provided due to nodes being unschedulable for this deployment. For example, nodes might be cordoned or otherwise unavailable.
292
299
293
300
The error message typically indicates the resource insufficiency in the cluster, for example `OutOfQuota: Kubernetes unschedulable. Details:0/1 nodes are available: 1 Too many pods...`. This message means that there are too many pods in the cluster and not enough resources to deploy the new model based on your request.
The user model might not be found. [Check the container logs](#get-container-logs) to get more details. Make sure you registered the model to the same workspace as the deployment.
377
386
378
-
To show details for a model in a workspace, you can select a model on the Azure Machine Learning studio **Models** page or run the following command. You must specify either version or label to get the model information.
387
+
To show details for a model in a workspace, run the following command. You must specify either version or label to get the model information.
379
388
380
389
# [Azure CLI](#tab/cli)
381
390
@@ -389,6 +398,10 @@ az ml model show --name <model-name> --version <version>
To show details for a model in a workspace, select a model on the Azure Machine Learning studio **Models** page
404
+
392
405
---
393
406
394
407
Also check if the blobs are present in the workspace storage account. For example, if the blob is `https://foobar.blob.core.windows.net/210212154504-1517266419/WebUpload/210212154504-1517266419/GaussianNB.pkl`, you can use the following command to check if the blob exists:
You can't see logs from the storage initializer in the studio. Use the Azure CLI or Python SDK commands.
432
+
416
433
---
417
434
418
435
#### MLflow model format with private network is unsupported
419
436
420
-
You can't use the private network feature with a MLflow model format if you're using the [legacy network isolation method for managed online endpoints](concept-secure-online-endpoint.md#secure-outbound-access-with-legacy-network-isolation-method). If you need to deploy a MLflow model with the no-code deployment approach, try using a [workspace managed virtual network](concept-secure-online-endpoint.md#secure-outbound-access-with-workspace-managed-virtual-network).
437
+
You can't use the private network feature with an MLflow model format if you're using the [legacy network isolation method for managed online endpoints](concept-secure-online-endpoint.md#secure-outbound-access-with-legacy-network-isolation-method). If you need to deploy an MLflow model with the no-code deployment approach, try using a [workspace managed virtual network](concept-secure-online-endpoint.md#secure-outbound-access-with-workspace-managed-virtual-network).
421
438
422
439
#### Resource requests greater than limits
423
440
424
441
Requests for resources must be less than or equal to limits. If you don't set limits, Azure Machine Learning sets default values when you attach your compute to a workspace. You can check the limits in the Azure portal or by using the `az ml compute show` command.
425
442
426
443
#### Azureml-fe not ready
427
444
428
-
The front-end `azureml-fe` component that routes incoming inference requests to deployed services installs during k8s-extension installation and automatically scales as needed. This component should have at least one healthy replica on the cluster.
445
+
The front-end `azureml-fe` component that routes incoming inference requests to deployed services installs during [k8s-extension](/cli/azure/k8s-extension) installation and automatically scales as needed. This component should have at least one healthy replica on the cluster.
429
446
430
447
You get this error if the component isn't available when you trigger a Kubernetes online endpoint or deployment creation or update request. Check the pod status and logs to fix this issue. You can also try to update the k8s-extension installed on the cluster.
431
448
@@ -727,7 +744,7 @@ Two actions can help prevent 503 status code errors: Changing the utilization le
727
744
728
745
- Change the utilization target at which autoscaling creates new replicas by setting the `autoscale_target_utilization` to a lower value. This change doesn't cause replicas to be created faster, but at a lower utilization threshold. For example, changing the value to 30% causes replicas to be created when 30% utilization occurs instead of waiting until the service is 70% utilized.
729
746
730
-
-Changing the minimum number of replicas provides a larger pool to handle the incoming spikes.
747
+
-Change the minimum number of replicas to provide a larger pool that can handle the incoming spikes.
0 commit comments