Merge pull request #214540 from siyigao1212/patch-2

prmerger-automator[bot] · web-flow · commit deeeeb32ee57 · 2022-10-14T19:42:56.000Z
Update how-to-troubleshoot-online-endpoints.md
diff --git a/articles/machine-learning/how-to-troubleshoot-online-endpoints.md b/articles/machine-learning/how-to-troubleshoot-online-endpoints.md
@@ -445,10 +445,10 @@ When you access online endpoints with REST requests, the returned status codes a
 | --- | --- | --- |
 | 200 | OK | Your model executed successfully, within your latency bound. |
 | 401 | Unauthorized | You don't have permission to do the requested action, such as score, or your token is expired. |
-| 404 | Not found | Your URL isn't correct. |
+| 404 | Not found | The endpoint doesn't have any valid deployment with positive weight. |
 | 408 | Request timeout | The model execution took longer than the timeout supplied in `request_timeout_ms` under `request_settings` of your model deployment config.|
 | 424 | Model Error | If your model container returns a non-200 response, Azure returns a 424. Check the `Model Status Code` dimension under the `Requests Per Minute` metric on your endpoint's [Azure Monitor Metric Explorer](../azure-monitor/essentials/metrics-getting-started.md). Or check response headers `ms-azureml-model-error-statuscode` and `ms-azureml-model-error-reason` for more information. |
-| 429 | Too many pending requests | Your model is getting more requests than it can handle. We allow maximum `max_concurrent_requests_per_instance` * `instance_count` / `request_process_time (in seconds)` requests per second. Additional requests are rejected. You can confirm these settings in your model deployment config under `request_settings` and `scale_settings`. If you're using auto-scaling, your model is getting requests faster than the system can scale up. With auto-scaling, you can try to resend requests with [exponential backoff](https://aka.ms/exponential-backoff). Doing so can give the system time to adjust. Apart from enable auto-scaling, you could also increase the number of instances by using the below [code](#how-to-calculate-instance-count). |
+| 429 | Too many pending requests | Your model is getting more requests than it can handle. We allow maximum 2 * `max_concurrent_requests_per_instance` * `instance_count` / `request_process_time (in seconds)` requests per second. Additional requests are rejected. You can confirm these settings in your model deployment config under `request_settings` and `scale_settings`, respectively. If you're using auto-scaling, your model is getting requests faster than the system can scale up. With auto-scaling, you can try to resend requests with [exponential backoff](https://aka.ms/exponential-backoff). Doing so can give the system time to adjust. Apart from enable auto-scaling, you could also increase the number of instances by using the below [code](#how-to-calculate-instance-count). |
 | 429 | Rate-limiting | The number of requests per second reached the [limit](./how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints) of managed online endpoints.|
 | 500 | Internal server error | Azure ML-provisioned infrastructure is failing. |
 
@@ -457,15 +457,15 @@ To increase the number of instances, you could calculate the required replicas f
 ```python
 from math import ceil
 # target requests per second
-target_qps = 20
+target_rps = 20
 # time to process the request (in seconds)
 request_process_time = 10
 # Maximum concurrent requests per instance
 max_concurrent_requests_per_instance = 1
 # The target CPU usage of the model container. 70% in this example
 target_utilization = .7
 
-concurrent_requests = target_qps * request_process_time / target_utilization
+concurrent_requests = target_rps * request_process_time / target_utilization
 
 # Number of instance count
 instance_count = ceil(concurrent_requests / max_concurrent_requests_per_instance)