You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-troubleshoot-online-endpoints.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -445,10 +445,10 @@ When you access online endpoints with REST requests, the returned status codes a
445
445
| --- | --- | --- |
446
446
| 200 | OK | Your model executed successfully, within your latency bound. |
447
447
| 401 | Unauthorized | You don't have permission to do the requested action, such as score, or your token is expired. |
448
-
| 404 | Not found |Your URL isn't correct. |
448
+
| 404 | Not found |The endpoint doesn't have any valid deployment with positive weight. |
449
449
| 408 | Request timeout | The model execution took longer than the timeout supplied in `request_timeout_ms` under `request_settings` of your model deployment config.|
450
450
| 424 | Model Error | If your model container returns a non-200 response, Azure returns a 424. Check the `Model Status Code` dimension under the `Requests Per Minute` metric on your endpoint's [Azure Monitor Metric Explorer](../azure-monitor/essentials/metrics-getting-started.md). Or check response headers `ms-azureml-model-error-statuscode` and `ms-azureml-model-error-reason` for more information. |
451
-
| 429 | Too many pending requests | Your model is getting more requests than it can handle. We allow maximum `max_concurrent_requests_per_instance` * `instance_count` / `request_process_time (in seconds)` requests per second. Additional requests are rejected. You can confirm these settings in your model deployment config under `request_settings` and `scale_settings`. If you're using auto-scaling, your model is getting requests faster than the system can scale up. With auto-scaling, you can try to resend requests with [exponential backoff](https://aka.ms/exponential-backoff). Doing so can give the system time to adjust. Apart from enable auto-scaling, you could also increase the number of instances by using the below [code](#how-to-calculate-instance-count). |
451
+
| 429 | Too many pending requests | Your model is getting more requests than it can handle. We allow maximum 2 * `max_concurrent_requests_per_instance` * `instance_count` / `request_process_time (in seconds)` requests per second. Additional requests are rejected. You can confirm these settings in your model deployment config under `request_settings` and `scale_settings`, respectively. If you're using auto-scaling, your model is getting requests faster than the system can scale up. With auto-scaling, you can try to resend requests with [exponential backoff](https://aka.ms/exponential-backoff). Doing so can give the system time to adjust. Apart from enable auto-scaling, you could also increase the number of instances by using the below [code](#how-to-calculate-instance-count). |
452
452
| 429 | Rate-limiting | The number of requests per second reached the [limit](./how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints) of managed online endpoints.|
453
453
| 500 | Internal server error | Azure ML-provisioned infrastructure is failing. |
454
454
@@ -457,15 +457,15 @@ To increase the number of instances, you could calculate the required replicas f
457
457
```python
458
458
from math import ceil
459
459
# target requests per second
460
-
target_qps=20
460
+
target_rps=20
461
461
# time to process the request (in seconds)
462
462
request_process_time =10
463
463
# Maximum concurrent requests per instance
464
464
max_concurrent_requests_per_instance =1
465
465
# The target CPU usage of the model container. 70% in this example
0 commit comments