Skip to content

Commit 8a8ec0f

Browse files
Merge pull request #255641 from dem108/patch-25
increase max request timeout for online endpoint
2 parents 4b774ec + 251f1c7 commit 8a8ec0f

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

articles/machine-learning/how-to-manage-quotas.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Azure Machine Learning managed online endpoints have limits described in the fol
139139
| Number of deployments per subscription | 200 | Yes |
140140
| Number of deployments per endpoint | 20 | Yes |
141141
| Number of instances per deployment | 20 <sup>2</sup> | Yes |
142-
| Max request time-out at endpoint level | 90 seconds | - |
142+
| Max request time-out at endpoint level | 180 seconds | - |
143143
| Total requests per second at endpoint level for all deployments | 500 <sup>3</sup> | Yes |
144144
| Total connections per second at endpoint level for all deployments | 500 <sup>3</sup> | Yes |
145145
| Total connections active at endpoint level for all deployments | 500 <sup>3</sup> | Yes |

articles/machine-learning/reference-yaml-deployment-managed-online.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: reference
99
ms.custom: cliv2, event-tier1-build-2022, build-2023
1010
author: dem108
1111
ms.author: sehan
12-
ms.date: 01/24/2023
12+
ms.date: 10/19/2023
1313
ms.reviewer: mopeakande
1414
---
1515

@@ -52,7 +52,7 @@ The source JSON schema can be found at https://azuremlschemas.azureedge.net/late
5252

5353
| Key | Type | Description | Default value |
5454
| --- | ---- | ----------- | ------------- |
55-
| `request_timeout_ms` | integer | The scoring timeout in milliseconds. Note that the maximum value allowed is `90000` milliseconds. See [Managed online endpoint quotas](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints) for more. | `5000` |
55+
| `request_timeout_ms` | integer | The scoring timeout in milliseconds. Note that the maximum value allowed is `180000` milliseconds. See [Managed online endpoint quotas](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints) for more. | `5000` |
5656
| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low may lead to under utilized nodes. Setting too low may also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
5757
| `max_queue_wait_ms` | integer | The maximum amount of time in milliseconds a request will stay in the queue. | `500` |
5858

0 commit comments

Comments
 (0)