Skip to content

Commit b1dddec

Browse files
committed
cx
1 parent 10eac61 commit b1dddec

File tree

2 files changed

+49
-33
lines changed

2 files changed

+49
-33
lines changed

articles/machine-learning/concept-endpoints-online.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Online endpoints for real-time inference
33
titleSuffix: Azure Machine Learning
4-
description: Learn about online endpoints for real-time inferencing in Azure Machine Learning.
4+
description: Learn about online endpoints for real-time inferencing in Azure Machine Learning, including managed online endpoints.
55
services: machine-learning
66
ms.service: azure-machine-learning
77
ms.subservice: inferencing
@@ -21,7 +21,6 @@ ms.date: 09/23/2024
2121

2222
This article describes online endpoints for real-time inferencing in Azure Machine Learning. Inferencing is the process of applying new input data to a machine learning model to generate outputs. Azure Machine Learning allows you to perform real-time inferencing on data by using models that are deployed to *online endpoints*. While these outputs are typically called *predictions*, you can use inferencing to generate outputs for other machine learning tasks, such as classification and clustering.
2323

24-
<a name="online-endpoints"></a>
2524
Online endpoints deploy models to a web server that can return predictions under the HTTP protocol. Online endpoints can operationalize models for real-time inference in synchronous, low-latency requests, and are best used when:
2625

2726
- You have low-latency requirements.
@@ -207,9 +206,9 @@ To deploy locally, you need the [Docker Engine](https://docs.docker.com/engine/i
207206
208207
Local debugging typically involves the following steps:
209208

210-
1. Check that the local deployment succeeded.
211-
1. Invoke the local endpoint for inferencing.
212-
1. Review the output logs for the `invoke` operation.
209+
- First, check that the local deployment succeeded.
210+
- Next, invoke the local endpoint for inferencing.
211+
- Finally, review the output logs for the `invoke` operation.
213212

214213
Local endpoints have the following limitations:
215214
- No support for traffic rules, authentication, or probe settings.

articles/machine-learning/how-to-troubleshoot-online-endpoints.md

Lines changed: 45 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: inferencing
88
author: msakande
99
ms.author: mopeakande
1010
ms.reviewer: sehan
11-
ms.date: 09/18/2024
11+
ms.date: 09/23/2024
1212
ms.topic: troubleshooting
1313
ms.custom: devplatv2, devx-track-azurecli, cliv2, sdkv2
1414
#Customer intent: As a data scientist, I want to figure out why my online endpoint deployment failed so that I can fix it.
@@ -26,7 +26,7 @@ The document structure reflects the way you should approach troubleshooting:
2626
1. Use [container logs](#get-container-logs) to help debug issues.
2727
1. Understand [common deployment errors](#common-deployment-errors) that might arise and how to fix them.
2828

29-
The [HTTP status codes](#http-status-codes) sections explain how invocation and prediction errors map to HTTP status codes when you score endpoints with REST requests.
29+
The [HTTP status codes](#http-status-codes) section explains how invocation and prediction errors map to HTTP status codes when you score endpoints with REST requests.
3030

3131
## Prerequisites
3232

@@ -40,6 +40,10 @@ The [HTTP status codes](#http-status-codes) sections explain how invocation and
4040

4141
- The Azure Machine Learning Python SDK v2. [Install the Azure Machine Learning SDK v2 for Python](/python/api/overview/azure/ai-ml-readme).
4242

43+
### [Studio](#tab/studio)
44+
45+
- An Azure Machine Learning workspace.
46+
4347
---
4448

4549
## Request tracing
@@ -75,6 +79,10 @@ For local deployment, use the `local=True` parameter. In this command,`ml_clien
7579
ml_client.begin_create_or_update(online_deployment, local=True)
7680
```
7781

82+
### [Studio](#tab/studio)
83+
84+
Azure Machine Learning studio doesn't support local deployment.
85+
7886
---
7987

8088
The following steps occur during local deployment:
@@ -101,24 +109,10 @@ For Kubernetes online endpoints, administrators can directly access the cluster
101109
```bash
102110
kubectl -n <compute-namespace> logs <container-name>
103111
```
104-
For more information about debugging with container logs, see [Get container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs).
105112

106113
> [!NOTE]
107114
> If you use Python logging, make sure to use the correct logging level, such as `INFO`, for the messages to be published to logs.
108115
109-
### See log output in Azure Machine Learning studio
110-
111-
To view log output from a container in Azure Machine Learning studio:
112-
113-
1. Select **Endpoints** in the left navigation bar.
114-
1. Select an endpoint name to view the endpoint details page.
115-
1. Select the **Logs** tab in the endpoint details page.
116-
1. Select the deployment log you want to see from the dropdown menu.
117-
118-
:::image type="content" source="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" lightbox="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" alt-text="A screenshot of observing deployment logs in the studio.":::
119-
120-
The logs are pulled from the inference server. To get logs from the storage initializer container, use the Azure CLI or Python SDK commands.
121-
122116
### See log output from containers
123117

124118
# [Azure CLI](#tab/cli)
@@ -163,6 +157,19 @@ ml_client.online_deployments.get_logs(
163157
)
164158
```
165159

160+
### [Studio](#tab/studio)
161+
162+
To view log output from a container in Azure Machine Learning studio:
163+
164+
1. Select **Endpoints** in the left navigation bar.
165+
1. Select an endpoint name to view the endpoint details page.
166+
1. Select the **Logs** tab in the endpoint details page.
167+
1. Select the deployment log you want to see from the dropdown menu.
168+
169+
:::image type="content" source="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" lightbox="media/how-to-troubleshoot-online-endpoints/deployment-logs.png" alt-text="A screenshot of observing deployment logs in the studio.":::
170+
171+
The logs are pulled from the inference server. To get logs from the storage initializer container, use the Azure CLI or Python SDK commands.
172+
166173
---
167174

168175
## Common deployment errors
@@ -213,7 +220,7 @@ If you're creating or updating a Kubernetes online deployment, also see [Common
213220

214221
### ERROR: ImageBuildFailure
215222

216-
This error returns when the Docker image environment is being built. You can check the build log for more information on the failure. The build log is located in the default storage for your Azure Machine Learning workspace.
223+
This error is returned when the Docker image environment is being built. You can check the build log for more information on the failure. The build log is located in the default storage for your Azure Machine Learning workspace.
217224

218225
The exact location might be returned as part of the error, for example `"the build log under the storage account '[storage-account-name]' in the container '[container-name]' at the path '[path-to-the-log]'"`.
219226

@@ -259,7 +266,7 @@ The following resources might run out of quota when using Azure services:
259266
- [Region-wide VM capacity](#region-wide-vm-capacity)
260267
- [Other](#other-quota)
261268

262-
For Kubernetes online endpoints only, the [Kubernetes](#kubernetes-quota) resource might also run out of quota.
269+
For Kubernetes online endpoints only, the [Kubernetes resource](#kubernetes-quota) might also run out of quota.
263270

264271
#### CPU quota
265272

@@ -269,14 +276,14 @@ You can check if there are unused deployments you can delete, or you can [submit
269276

270277
#### Cluster quota
271278

272-
The OutOfQuota error occurs when you don't have enough Azure Machine Learning compute cluster quota. The quota defines the total number of clusters per subscription that you can use at the same time to deploy CPU or GPU nodes in the Azure cloud.
279+
The `OutOfQuota` error occurs when you don't have enough Azure Machine Learning compute cluster quota. The quota defines the total number of clusters per subscription that you can use at the same time to deploy CPU or GPU nodes in the Azure cloud.
273280

274281
#### Disk quota
275282

276-
The OutOfQuota error occurs when the size of the model is larger than the available disk space and the model can't be downloaded. Try using a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more disk space or reducing the image and model size.
283+
The `OutOfQuota` error occurs when the size of the model is larger than the available disk space and the model can't be downloaded. Try using a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more disk space or reducing the image and model size.
277284

278285
#### Memory quota
279-
The OutOfQuota error occurs when the memory footprint of the model is larger than the available memory. Try a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more memory.
286+
The `OutOfQuota` error occurs when the memory footprint of the model is larger than the available memory. Try a [SKU](reference-managed-online-endpoints-vm-sku-list.md) with more memory.
280287

281288
#### Role assignment quota
282289

@@ -288,7 +295,7 @@ Try to delete some unused endpoints in this subscription. If all your endpoints
288295

289296
#### Kubernetes quota
290297

291-
The OutOfQuota error occurs when the requested CPU or memory can't be provided due to nodes being unschedulable for this deployment. For example, nodes might be cordoned or otherwise unavailable.
298+
The `OutOfQuota` error occurs when the requested CPU or memory can't be provided due to nodes being unschedulable for this deployment. For example, nodes might be cordoned or otherwise unavailable.
292299

293300
The error message typically indicates the resource insufficiency in the cluster, for example `OutOfQuota: Kubernetes unschedulable. Details:0/1 nodes are available: 1 Too many pods...`. This message means that there are too many pods in the cluster and not enough resources to deploy the new model based on your request.
294301

@@ -324,9 +331,11 @@ ml_client.online_deployments.get_logs(
324331
)
325332
```
326333

327-
---
334+
### [Studio](#tab/studio)
328335

329-
You can also [check the deployment log output in Azure Machine Learning studio](#see-log-output-in-azure-machine-learning-studio).
336+
[Check the deployment log output in Azure Machine Learning studio](#see-log-output-in-azure-machine-learning-studio).
337+
338+
---
330339

331340
### ERROR: BadArgument
332341

@@ -375,7 +384,7 @@ az acr repository show-tags -n testacr --repository azureml/azureml_92a029f831ce
375384

376385
The user model might not be found. [Check the container logs](#get-container-logs) to get more details. Make sure you registered the model to the same workspace as the deployment.
377386

378-
To show details for a model in a workspace, you can select a model on the Azure Machine Learning studio **Models** page or run the following command. You must specify either version or label to get the model information.
387+
To show details for a model in a workspace, run the following command. You must specify either version or label to get the model information.
379388

380389
# [Azure CLI](#tab/cli)
381390

@@ -389,6 +398,10 @@ az ml model show --name <model-name> --version <version>
389398
ml_client.models.get(name="<model-name>", version=<version>)
390399
```
391400

401+
### [Studio](#tab/studio)
402+
403+
To show details for a model in a workspace, select a model on the Azure Machine Learning studio **Models** page
404+
392405
---
393406

394407
Also check if the blobs are present in the workspace storage account. For example, if the blob is `https://foobar.blob.core.windows.net/210212154504-1517266419/WebUpload/210212154504-1517266419/GaussianNB.pkl`, you can use the following command to check if the blob exists:
@@ -413,19 +426,23 @@ ml_client.online_deployments.get_logs(
413426
)
414427
```
415428

429+
### [Studio](#tab/studio)
430+
431+
You can't see logs from the storage initializer in the studio. Use the Azure CLI or Python SDK commands.
432+
416433
---
417434

418435
#### MLflow model format with private network is unsupported
419436

420-
You can't use the private network feature with a MLflow model format if you're using the [legacy network isolation method for managed online endpoints](concept-secure-online-endpoint.md#secure-outbound-access-with-legacy-network-isolation-method). If you need to deploy a MLflow model with the no-code deployment approach, try using a [workspace managed virtual network](concept-secure-online-endpoint.md#secure-outbound-access-with-workspace-managed-virtual-network).
437+
You can't use the private network feature with an MLflow model format if you're using the [legacy network isolation method for managed online endpoints](concept-secure-online-endpoint.md#secure-outbound-access-with-legacy-network-isolation-method). If you need to deploy an MLflow model with the no-code deployment approach, try using a [workspace managed virtual network](concept-secure-online-endpoint.md#secure-outbound-access-with-workspace-managed-virtual-network).
421438

422439
#### Resource requests greater than limits
423440

424441
Requests for resources must be less than or equal to limits. If you don't set limits, Azure Machine Learning sets default values when you attach your compute to a workspace. You can check the limits in the Azure portal or by using the `az ml compute show` command.
425442

426443
#### Azureml-fe not ready
427444

428-
The front-end `azureml-fe` component that routes incoming inference requests to deployed services installs during k8s-extension installation and automatically scales as needed. This component should have at least one healthy replica on the cluster.
445+
The front-end `azureml-fe` component that routes incoming inference requests to deployed services installs during [k8s-extension](/cli/azure/k8s-extension) installation and automatically scales as needed. This component should have at least one healthy replica on the cluster.
429446

430447
You get this error if the component isn't available when you trigger a Kubernetes online endpoint or deployment creation or update request. Check the pod status and logs to fix this issue. You can also try to update the k8s-extension installed on the cluster.
431448

@@ -727,7 +744,7 @@ Two actions can help prevent 503 status code errors: Changing the utilization le
727744

728745
- Change the utilization target at which autoscaling creates new replicas by setting the `autoscale_target_utilization` to a lower value. This change doesn't cause replicas to be created faster, but at a lower utilization threshold. For example, changing the value to 30% causes replicas to be created when 30% utilization occurs instead of waiting until the service is 70% utilized.
729746

730-
- Changing the minimum number of replicas provides a larger pool to handle the incoming spikes.
747+
- Change the minimum number of replicas to provide a larger pool that can handle the incoming spikes.
731748

732749
#### How to calculate instance count
733750

0 commit comments

Comments
 (0)