You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inferencing is the process of applying new input data to a machine learning model to generate outputs. Azure Machine Learning allows you to perform real-time inferencing on data by using models that are deployed to *online endpoints*. While these outputs are typically called *predictions*, you can use inferencing to generate outputs for other machine learning tasks, such as classification and clustering.
22
+
This article describes online endpoints for real-time inferencing in Azure Machine Learning. Inferencing is the process of applying new input data to a machine learning model to generate outputs.
23
+
24
+
Azure Machine Learning allows you to perform real-time inferencing on data by using models that are deployed to *online endpoints*. While these outputs are typically called *predictions*, you can use inferencing to generate outputs for other machine learning tasks, such as classification and clustering.
23
25
24
26
<aname="online-endpoints"></a>
25
27
Online endpoints deploy models to a web server that can return predictions under the HTTP protocol. Online endpoints can operationalize models for real-time inference in synchronous, low-latency requests, and are best used when:
@@ -228,13 +230,13 @@ For more information about local debugging, see [Deploy and debug locally by usi
228
230
229
231
As with local debugging, you need to have the [Docker Engine](https://docs.docker.com/engine/install/) installed and running, and then deploy a model to the local Docker environment. Once you have a local deployment, Azure Machine Learning local endpoints use Docker and Visual Studio Code development containers (dev containers) to build and configure a local debugging environment.
230
232
231
-
With dev containers, you can use Visual Studio Code features such as interactive debugging from inside a Docker container. For more information about interactively debugging online endpoints in VS Code, see [Debug online endpoints locally in Visual Studio Code](how-to-debug-managed-online-endpoints-visual-studio-code.md).
233
+
With dev containers, you can use Visual Studio Code features such as interactive debugging from inside a Docker container. For more information about interactively debugging online endpoints in Visual Studio Code, see [Debug online endpoints locally in Visual Studio Code](how-to-debug-managed-online-endpoints-visual-studio-code.md).
232
234
233
235
### Debugging with container logs
234
236
235
237
You can't get direct access to a VM where a model deploys, but you can get logs from the following containers that are running on the VM:
236
238
237
-
- The [inference server](how-to-inference-server-http.md)) console log contains the output of print/logging functions from your scoring script *score.py* code.
239
+
- The [inference server](how-to-inference-server-http.md) console log contains the output of print/logging functions from your scoring script *score.py* code.
238
240
- Storage initializer logs contain information on whether code and model data successfully downloaded to the container. The container runs before the inference server container starts to run.
239
241
240
242
For more information about debugging with container logs, see [Get container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs).
@@ -294,7 +296,7 @@ Inbound communications use the private endpoint of the Azure Machine Learning wo
294
296
295
297
### Monitoring online endpoints and deployments
296
298
297
-
Azure Machine Learning endpoints integrate with [Azure Monitor](monitor-azure-machine-learning.md). Azure Monitor integration lets you view metrics in charts, configure alerts, query from log tables, and use Application Insights to analyze events from user containers. For more information, see [Monitor online endpoints](how-to-monitor-online-endpoints.md).
299
+
Azure Machine Learning endpoints integrate with [Azure Monitor](monitor-azure-machine-learning.md). Azure Monitor integration lets you view metrics in charts, configure alerts, query log tables, and use Application Insights to analyze events from user containers. For more information, see [Monitor online endpoints](how-to-monitor-online-endpoints.md).
298
300
299
301
### Secret injection in online deployments (preview)
This article describes how to troubleshoot and resolve common Azure Machine Learning online endpoint deployment and scoring issues.
22
22
23
-
The first sections describe how to use [local deployment](#deploy-locally) and [container logs](#get-container-logs) to help debug issues.
23
+
The document structure reflects the way you should approach troubleshooting:
24
24
25
-
The rest of the article discusses [common deployment errors](#common-deployment-errors), [errors specific to Kubernetes deployments](#common-errors-specific-to-kubernetes-deployments), and [model consumption](#model-consumption-issues), [network isolation](#network-isolation-issues), [inference server](#inference-server-issues), and [other common issues](#other-common-issues).
25
+
1. Use [local deployment](#deploy-locally) to test and debug your models locally before deploying in the cloud.
26
+
1. Use [container logs](#get-container-logs) to help debug issues.
27
+
1. Understand [common deployment errors](#common-deployment-errors) that might arise and how to fix them.
28
+
29
+
The [HTTP status codes](#http-status-codes) sections explains how invocation and prediction errors map to HTTP status codes when you score endpoints with REST requests.
26
30
27
31
## Prerequisites
28
32
@@ -663,7 +667,7 @@ To troubleshoot errors by reattaching, make sure to reattach with the same confi
663
667
664
668
## Model consumption issues
665
669
666
-
Common model consumption errors resulting from the endpoint `invoke` operation status include [bandwidth limit issues](#bandwidth-limit-issues), [HTTP status codes](#http-status-codes), and [blocked by CORS policy](#blocked-by-cors-policy).
670
+
Common model consumption errors resulting from the endpoint `invoke` operation status include [bandwidth limit issues](#bandwidth-limit-issues), [CORS policy](#blocked-by-cors-policy), and various [HTTP status codes](#http-status-codes).
667
671
668
672
### Bandwidth limit issues
669
673
@@ -675,6 +679,15 @@ Two response trailers are returned if the bandwidth limit is enforced:
675
679
-`ms-azureml-bandwidth-request-delay-ms` is the delay time in milliseconds it took for the request stream transfer.
676
680
-`ms-azureml-bandwidth-response-delay-ms`is the delay time in milliseconds it took for the response stream transfer.
677
681
682
+
### Blocked by CORS policy
683
+
684
+
V2 online endpoints don't support [Cross-Origin Resource Sharing (CORS)](https://developer.mozilla.org/docs/Web/HTTP/CORS) natively. If your web application tries to invoke the endpoint without properly handling the CORS preflight requests, you can get the following error message:
685
+
686
+
```output
687
+
Access to fetch at 'https://{your-endpoint-name}.{your-region}.inference.ml.azure.com/score' from origin http://{your-url} has been blocked by CORS policy: Response to preflight request doesn't pass access control check. No 'Access-control-allow-origin' header is present on the request resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with the CORS disabled.
688
+
```
689
+
You can use Azure Functions, Azure Application Gateway, or another service as an interim layer to handle CORS preflight requests.
690
+
678
691
### HTTP status codes
679
692
680
693
When you access online endpoints with REST requests, the returned status codes adhere to the standards for [HTTP status codes](https://aka.ms/http-status-codes). The following sections present details about how endpoint invocation and prediction errors map to HTTP status codes.
@@ -706,11 +719,11 @@ The following table contains common error codes when REST requests consume Kuber
706
719
| 504 | Request times out | A 504 status code indicates that the request timed out. The default timeout setting is 5 seconds. You can increase the timeout or try to speed up the endpoint by modifying *score.py* to remove unnecessary calls. If these actions don't correct the problem, the code might be in a nonresponsive state or an infinite loop. Follow [ERROR: ResourceNotReady](#error-resourcenotready) to debug the *score.py* file. |
707
720
| 500 | Internal server error | Azure Machine Learning-provisioned infrastructure is failing.|
708
721
709
-
#### How to prevent 503 status codes
722
+
#### How to prevent 503 status code errors
710
723
711
724
Kubernetes online deployments support autoscaling, which allows replicas to be added to support extra load. For more information, see [Azure Machine Learning inference router](how-to-kubernetes-inference-routing-azureml-fe.md). The decision to scale up or down is based on utilization of the current container replicas.
712
725
713
-
Two actions can help prevent 503 status codes: Changing the utilization level for creating new replicas, or changing the minimum number of replicas. You can use these approaches individually or in combination.
726
+
Two actions can help prevent 503 status code errors: Changing the utilization level for creating new replicas, or changing the minimum number of replicas. You can use these approaches individually or in combination.
714
727
715
728
- Change the utilization target at which autoscaling creates new replicas by setting the `autoscale_target_utilization` to a lower value. This change doesn't cause replicas to be created faster, but at a lower utilization threshold. For example, changing the value to 30% causes replicas to be created when 30% utilization occurs instead of waiting until the service is 70% utilized.
716
729
@@ -742,15 +755,6 @@ To increase the number of instances, you can calculate the required replicas as
742
755
743
756
If the Kubernetes online endpoint is already using the current max replicas and you still get 503 status codes, increase the `autoscale_max_replicas` value to increase the maximum number of replicas.
744
757
745
-
### Blocked by CORS policy
746
-
747
-
V2 online endpoints don't support [Cross-Origin Resource Sharing (CORS)](https://developer.mozilla.org/docs/Web/HTTP/CORS) natively. If your web application tries to invoke the endpoint without properly handling the CORS preflight requests, you can get the following error message:
748
-
749
-
```output
750
-
Access to fetch at 'https://{your-endpoint-name}.{your-region}.inference.ml.azure.com/score' from origin http://{your-url} has been blocked by CORS policy: Response to preflight request doesn't pass access control check. No 'Access-control-allow-origin' header is present on the request resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with the CORS disabled.
751
-
```
752
-
You can use Azure Functions, Azure Application Gateway, or another service as an interim layer to handle CORS preflight requests.
753
-
754
758
## Network isolation issues
755
759
756
760
This section provides information about common network isolation issues.
0 commit comments