You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Learn how to troubleshoot and solve, or work around, common errors you may encounter when deploying a model to Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) using Azure Machine Learning.
19
+
Learn how to troubleshoot and solve, or work around, common errors you might encounter when deploying a model to Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) using Azure Machine Learning.
20
20
21
21
> [!NOTE]
22
-
> If you are deploying a model to Azure Kubernetes Service (AKS), we advise you enable [Azure Monitor](/azure/azure-monitor/containers/container-insights-enable-existing-clusters) for that cluster. This will help you understand overall cluster health and resource usage. You might also find the following resources useful:
22
+
> If you're deploying a model to Azure Kubernetes Service (AKS), we recommend enabling [Azure Monitor](/azure/azure-monitor/containers/container-insights-enable-existing-clusters) for that cluster. This helps you understand overall cluster health and resource usage. You might also find the following resources useful:
23
23
>
24
24
> *[Check for Resource Health events impacting your AKS cluster](/azure/aks/aks-resource-health)
25
25
> *[Azure Kubernetes Service Diagnostics](/azure/aks/concepts-diagnostics)
26
26
>
27
-
> If you are trying to deploy a model to an unhealthy or overloaded cluster, it is expected to experience issues. If you need help troubleshooting AKS cluster problems please contact AKS Support.
27
+
> If you're trying to deploy a model to an unhealthy or overloaded cluster, it's expected to experience issues. If you need help troubleshooting AKS cluster problems, contact AKS Support.
28
28
29
29
## Prerequisites
30
30
@@ -109,7 +109,7 @@ The local inference server allows you to quickly debug your entry script (`score
This example prints the local path (relative to `/var/azureml-app`) in the container where your scoring script is expecting to find the model file or folder. Then you can verify if the file or folder is indeed where it's expected to be.
145
+
This example prints the local path (relative to `/var/azureml-app`) in the container where your scoring script is expecting to find the model file or folder. Then you can verify if the file or folder is where you expect it.
146
146
147
-
Setting the logging level to DEBUG may cause additional information to be logged, which may be useful in identifying the failure.
147
+
Setting the logging level to DEBUG might cause additional information to be logged, which might be useful in identifying the failure.
148
148
149
149
## Function fails: run(input_data)
150
150
@@ -171,7 +171,7 @@ A 502 status code indicates that the service has thrown an exception or crashed
171
171
172
172
## HTTP status code 503
173
173
174
-
Azure Kubernetes Service deployments support autoscaling, which allows replicas to be added to support extra load. The autoscaler is designed to handle **gradual** changes in load. If you receive large spikes in requests per second, clients may receive an HTTP status code 503. Even though the autoscaler reacts quickly, it takes AKS a significant amount of time to create more containers.
174
+
Azure Kubernetes Service deployments support autoscaling, which allows replicas to be added to support extra load. The autoscaler is designed to handle **gradual** changes in load. If you receive large spikes in requests per second, clients might receive an HTTP status code 503. Even though the autoscaler reacts quickly, it takes AKS a significant amount of time to create more containers.
175
175
176
176
Decisions to scale up/down is based off of utilization of the current container replicas. The number of replicas that are busy (processing a request) divided by the total number of current replicas is the current utilization. If this number exceeds `autoscale_target_utilization`, then more replicas are created. If it's lower, then replicas are reduced. Decisions to add replicas are eager and fast (around 1 second). Decisions to remove replicas are conservative (around 1 minute). By default, autoscaling target utilization is set to **70%**, which means that the service can handle spikes in requests per second (RPS) of **up to 30%**.
177
177
@@ -183,7 +183,7 @@ There are two things that can help prevent 503 status codes:
183
183
* Change the utilization level at which autoscaling creates new replicas. You can adjust the utilization target by setting the `autoscale_target_utilization` to a lower value.
184
184
185
185
> [!IMPORTANT]
186
-
> This change does not cause replicas to be created *faster*. Instead, they are created at a lower utilization threshold. Instead of waiting until the service is 70% utilized, changing the value to 30% causes replicas to be created when 30% utilization occurs.
186
+
> This change doesn't cause replicas to be created *faster*. Instead, they're created at a lower utilization threshold. Instead of waiting until the service is 70% utilized, changing the value to 30% causes replicas to be created when 30% utilization occurs.
187
187
188
188
If the web service is already using the current max replicas and you're still seeing 503 status codes, increase the `autoscale_max_replicas` value to increase the maximum number of replicas.
189
189
@@ -209,15 +209,15 @@ There are two things that can help prevent 503 status codes:
209
209
```
210
210
211
211
> [!NOTE]
212
-
> If you receive request spikes larger than the new minimum replicas can handle, you may receive 503s again. For example, as traffic to your service increases, you might need to increase the minimum replicas.
212
+
> If you receive request spikes larger than the new minimum replicas can handle, you might receive 503s again. For example, as traffic to your service increases, you might need to increase the minimum replicas.
213
213
214
214
For more information on setting `autoscale_target_utilization`, `autoscale_max_replicas`, and `autoscale_min_replicas` for, see the [AksWebservice](/python/api/azureml-core/azureml.core.webservice.akswebservice) module reference.
215
215
216
216
## HTTP status code 504
217
217
218
218
A 504 status code indicates that the request has timed out. The default timeout is 1 minute.
219
219
220
-
You can increase the timeout or try to speed up the service by modifying the score.py to remove unnecessary calls. If these actions don't correct the problem, use the information in this article to debug the score.py file. The code may be in a non-responsive state or an infinite loop.
220
+
You can increase the timeout or try to speed up the service by modifying the score.py to remove unnecessary calls. If these actions don't correct the problem, use the information in this article to debug the score.py file. The code might be in a non-responsive state or an infinite loop.
0 commit comments