Skip to content

Commit 86a2cdc

Browse files
authored
Update how-to-kubernetes-inference-routing-azureml-fe.md
1 parent f02a63e commit 86a2cdc

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/machine-learning/how-to-kubernetes-inference-routing-azureml-fe.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ utilization_percentage = (The number of replicas that are busy processing a requ
7777
```
7878
If this number exceeds `target_utilization_percentage`, then more replicas are created. If it's lower, then replicas are reduced. By default, the target utilization is 70%.
7979

80-
Decisions to add replicas are eager and fast (around 1 second). Decisions to remove replicas are conservative (around 1 minute).
80+
Decisions to add replicas are eager and fast. Decisions to remove replicas are conservative (around 20 times of the scale up refresh interval).
8181

8282
For example, if you want to deploy a model service and want to know many instances (pods/replicas) should be configured for target requests per second (RPS) and target response time. You can calculate the required replicas by using the following code:
8383

0 commit comments

Comments
 (0)