Skip to content

Commit 112100c

Browse files
committed
latest changes
1 parent 56ffc14 commit 112100c

File tree

2 files changed

+28
-22
lines changed

2 files changed

+28
-22
lines changed

articles/machine-learning/concept-endpoints-online.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Online endpoints for real-time inference
33
titleSuffix: Azure Machine Learning
4-
description: Learn about online endpoints for real-time inference in Azure Machine Learning.
4+
description: Learn about online endpoints for real-time inferencing in Azure Machine Learning.
55
services: machine-learning
66
ms.service: azure-machine-learning
77
ms.subservice: inferencing
@@ -10,16 +10,18 @@ author: msakande
1010
ms.author: mopeakande
1111
ms.reviewer: sehan
1212
ms.custom: devplatv2
13-
ms.date: 09/16/2024
13+
ms.date: 09/18/2024
1414

1515
#Customer intent: As an ML pro, I want to understand what an online endpoint is and why I need it.
1616
---
1717

18-
# Online endpoint deployment for real-time inference
18+
# Online endpoint deployment for real-time inferencing
1919

2020
[!INCLUDE [dev v2](includes/machine-learning-dev-v2.md)]
2121

22-
Inferencing is the process of applying new input data to a machine learning model to generate outputs. Azure Machine Learning allows you to perform real-time inferencing on data by using models that are deployed to *online endpoints*. While these outputs are typically called *predictions*, you can use inferencing to generate outputs for other machine learning tasks, such as classification and clustering.
22+
This article describes online endpoints for real-time inferencing in Azure Machine Learning. Inferencing is the process of applying new input data to a machine learning model to generate outputs.
23+
24+
Azure Machine Learning allows you to perform real-time inferencing on data by using models that are deployed to *online endpoints*. While these outputs are typically called *predictions*, you can use inferencing to generate outputs for other machine learning tasks, such as classification and clustering.
2325

2426
<a name="online-endpoints"></a>
2527
Online endpoints deploy models to a web server that can return predictions under the HTTP protocol. Online endpoints can operationalize models for real-time inference in synchronous, low-latency requests, and are best used when:
@@ -228,13 +230,13 @@ For more information about local debugging, see [Deploy and debug locally by usi
228230

229231
As with local debugging, you need to have the [Docker Engine](https://docs.docker.com/engine/install/) installed and running, and then deploy a model to the local Docker environment. Once you have a local deployment, Azure Machine Learning local endpoints use Docker and Visual Studio Code development containers (dev containers) to build and configure a local debugging environment.
230232

231-
With dev containers, you can use Visual Studio Code features such as interactive debugging from inside a Docker container. For more information about interactively debugging online endpoints in VS Code, see [Debug online endpoints locally in Visual Studio Code](how-to-debug-managed-online-endpoints-visual-studio-code.md).
233+
With dev containers, you can use Visual Studio Code features such as interactive debugging from inside a Docker container. For more information about interactively debugging online endpoints in Visual Studio Code, see [Debug online endpoints locally in Visual Studio Code](how-to-debug-managed-online-endpoints-visual-studio-code.md).
232234

233235
### Debugging with container logs
234236

235237
You can't get direct access to a VM where a model deploys, but you can get logs from the following containers that are running on the VM:
236238

237-
- The [inference server](how-to-inference-server-http.md)) console log contains the output of print/logging functions from your scoring script *score.py* code.
239+
- The [inference server](how-to-inference-server-http.md) console log contains the output of print/logging functions from your scoring script *score.py* code.
238240
- Storage initializer logs contain information on whether code and model data successfully downloaded to the container. The container runs before the inference server container starts to run.
239241

240242
For more information about debugging with container logs, see [Get container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs).
@@ -294,7 +296,7 @@ Inbound communications use the private endpoint of the Azure Machine Learning wo
294296

295297
### Monitoring online endpoints and deployments
296298

297-
Azure Machine Learning endpoints integrate with [Azure Monitor](monitor-azure-machine-learning.md). Azure Monitor integration lets you view metrics in charts, configure alerts, query from log tables, and use Application Insights to analyze events from user containers. For more information, see [Monitor online endpoints](how-to-monitor-online-endpoints.md).
299+
Azure Machine Learning endpoints integrate with [Azure Monitor](monitor-azure-machine-learning.md). Azure Monitor integration lets you view metrics in charts, configure alerts, query log tables, and use Application Insights to analyze events from user containers. For more information, see [Monitor online endpoints](how-to-monitor-online-endpoints.md).
298300

299301
### Secret injection in online deployments (preview)
300302

articles/machine-learning/how-to-troubleshoot-online-endpoints.md

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: inferencing
88
author: msakande
99
ms.author: mopeakande
1010
ms.reviewer: sehan
11-
ms.date: 09/16/2024
11+
ms.date: 09/18/2024
1212
ms.topic: troubleshooting
1313
ms.custom: devplatv2, devx-track-azurecli, cliv2, sdkv2
1414
#Customer intent: As a data scientist, I want to figure out why my online endpoint deployment failed so that I can fix it.
@@ -20,9 +20,13 @@ ms.custom: devplatv2, devx-track-azurecli, cliv2, sdkv2
2020

2121
This article describes how to troubleshoot and resolve common Azure Machine Learning online endpoint deployment and scoring issues.
2222

23-
The first sections describe how to use [local deployment](#deploy-locally) and [container logs](#get-container-logs) to help debug issues.
23+
The document structure reflects the way you should approach troubleshooting:
2424

25-
The rest of the article discusses [common deployment errors](#common-deployment-errors), [errors specific to Kubernetes deployments](#common-errors-specific-to-kubernetes-deployments), and [model consumption](#model-consumption-issues), [network isolation](#network-isolation-issues), [inference server](#inference-server-issues), and [other common issues](#other-common-issues).
25+
1. Use [local deployment](#deploy-locally) to test and debug your models locally before deploying in the cloud.
26+
1. Use [container logs](#get-container-logs) to help debug issues.
27+
1. Understand [common deployment errors](#common-deployment-errors) that might arise and how to fix them.
28+
29+
The [HTTP status codes](#http-status-codes) sections explains how invocation and prediction errors map to HTTP status codes when you score endpoints with REST requests.
2630

2731
## Prerequisites
2832

@@ -663,7 +667,7 @@ To troubleshoot errors by reattaching, make sure to reattach with the same confi
663667

664668
## Model consumption issues
665669

666-
Common model consumption errors resulting from the endpoint `invoke` operation status include [bandwidth limit issues](#bandwidth-limit-issues), [HTTP status codes](#http-status-codes), and [blocked by CORS policy](#blocked-by-cors-policy).
670+
Common model consumption errors resulting from the endpoint `invoke` operation status include [bandwidth limit issues](#bandwidth-limit-issues), [CORS policy](#blocked-by-cors-policy), and various [HTTP status codes](#http-status-codes).
667671

668672
### Bandwidth limit issues
669673

@@ -675,6 +679,15 @@ Two response trailers are returned if the bandwidth limit is enforced:
675679
- `ms-azureml-bandwidth-request-delay-ms` is the delay time in milliseconds it took for the request stream transfer.
676680
- `ms-azureml-bandwidth-response-delay-ms`is the delay time in milliseconds it took for the response stream transfer.
677681

682+
### Blocked by CORS policy
683+
684+
V2 online endpoints don't support [Cross-Origin Resource Sharing (CORS)](https://developer.mozilla.org/docs/Web/HTTP/CORS) natively. If your web application tries to invoke the endpoint without properly handling the CORS preflight requests, you can get the following error message:
685+
686+
```output
687+
Access to fetch at 'https://{your-endpoint-name}.{your-region}.inference.ml.azure.com/score' from origin http://{your-url} has been blocked by CORS policy: Response to preflight request doesn't pass access control check. No 'Access-control-allow-origin' header is present on the request resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with the CORS disabled.
688+
```
689+
You can use Azure Functions, Azure Application Gateway, or another service as an interim layer to handle CORS preflight requests.
690+
678691
### HTTP status codes
679692

680693
When you access online endpoints with REST requests, the returned status codes adhere to the standards for [HTTP status codes](https://aka.ms/http-status-codes). The following sections present details about how endpoint invocation and prediction errors map to HTTP status codes.
@@ -706,11 +719,11 @@ The following table contains common error codes when REST requests consume Kuber
706719
| 504 | Request times out | A 504 status code indicates that the request timed out. The default timeout setting is 5 seconds. You can increase the timeout or try to speed up the endpoint by modifying *score.py* to remove unnecessary calls. If these actions don't correct the problem, the code might be in a nonresponsive state or an infinite loop. Follow [ERROR: ResourceNotReady](#error-resourcenotready) to debug the *score.py* file. |
707720
| 500 | Internal server error | Azure Machine Learning-provisioned infrastructure is failing.|
708721

709-
#### How to prevent 503 status codes
722+
#### How to prevent 503 status code errors
710723

711724
Kubernetes online deployments support autoscaling, which allows replicas to be added to support extra load. For more information, see [Azure Machine Learning inference router](how-to-kubernetes-inference-routing-azureml-fe.md). The decision to scale up or down is based on utilization of the current container replicas.
712725

713-
Two actions can help prevent 503 status codes: Changing the utilization level for creating new replicas, or changing the minimum number of replicas. You can use these approaches individually or in combination.
726+
Two actions can help prevent 503 status code errors: Changing the utilization level for creating new replicas, or changing the minimum number of replicas. You can use these approaches individually or in combination.
714727

715728
- Change the utilization target at which autoscaling creates new replicas by setting the `autoscale_target_utilization` to a lower value. This change doesn't cause replicas to be created faster, but at a lower utilization threshold. For example, changing the value to 30% causes replicas to be created when 30% utilization occurs instead of waiting until the service is 70% utilized.
716729

@@ -742,15 +755,6 @@ To increase the number of instances, you can calculate the required replicas as
742755
743756
If the Kubernetes online endpoint is already using the current max replicas and you still get 503 status codes, increase the `autoscale_max_replicas` value to increase the maximum number of replicas.
744757

745-
### Blocked by CORS policy
746-
747-
V2 online endpoints don't support [Cross-Origin Resource Sharing (CORS)](https://developer.mozilla.org/docs/Web/HTTP/CORS) natively. If your web application tries to invoke the endpoint without properly handling the CORS preflight requests, you can get the following error message:
748-
749-
```output
750-
Access to fetch at 'https://{your-endpoint-name}.{your-region}.inference.ml.azure.com/score' from origin http://{your-url} has been blocked by CORS policy: Response to preflight request doesn't pass access control check. No 'Access-control-allow-origin' header is present on the request resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with the CORS disabled.
751-
```
752-
You can use Azure Functions, Azure Application Gateway, or another service as an interim layer to handle CORS preflight requests.
753-
754758
## Network isolation issues
755759

756760
This section provides information about common network isolation issues.

0 commit comments

Comments
 (0)