Skip to content

Commit caec626

Browse files
committed
update azure throttling README
1 parent 6cfa9c3 commit caec626

File tree

1 file changed

+23
-3
lines changed
  • cluster-autoscaler/cloudprovider/azure

1 file changed

+23
-3
lines changed

cluster-autoscaler/cloudprovider/azure/README.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,26 @@ To run a cluster autoscaler pod on a master node, the deployment should tolerate
128128

129129
To run a cluster autoscaler pod with Azure managed service identity (MSI), use [cluster-autoscaler-vmss-msi.yaml](examples/cluster-autoscaler-vmss-msi.yaml) instead.
130130

131+
#### Azure API Throttling
132+
Azure has hard limits on the number of read and write requests against Azure APIs *per subscription, per region*. Running lots of clusters in a single subscription, or running a single large, dynamic cluster in a subscription can produce side effects that exceed the number of calls permitted within a given time window for a particular category of requests. See the following documents for more detail on Azure API throttling in general:
133+
134+
- https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling
135+
- https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/troubleshooting-throttling-errors
136+
137+
Given the dynamic nature of cluster autoscaler, it can be a trigger for hitting those rate limits on the subscriptions. This in turn can affect other components running in the cluster that depend on Azure APIs such as kube-controller-manager.
138+
139+
When using K8s versions older than v1.18, we recommend using at least **v.1.17.5, v1.16.9, v1.15.12** which include various improvements on the cloud-provider side that have an impact on the number of API calls during scale down operations.
140+
141+
As for CA versions older than 1.18, we recommend using at least **v.1.17.2, v1.16.5, v1.15.6**.
142+
143+
In addition, cluster-autoscaler exposes a `AZURE_VMSS_CACHE_TTL` environment variable which controls the rate of `GetVMScaleSet` being made. By default, this is 15 seconds but setting this to a higher value such as 60 seconds can protect against API throttling. The caches used are proactively incremented and decremented with the scale up and down operations and this higher value doesn't have any noticeable impact on performance. **Note that the value is in seconds**
144+
145+
| Config Name | Default | Environment Variable | Cloud Config File |
146+
| ----------- | ------- | -------------------- | ----------------- |
147+
| VmssCacheTTL | 15 | AZURE_VMSS_CACHE_TTL | vmssCacheTTL |
148+
149+
When using K8s 1.18 or higher, it is also recommended to configure backoff and retries on the client as described [here](#rate-limit-and-back-off-retries)
150+
131151
### Standard deployment
132152

133153
Prerequisites:
@@ -148,7 +168,7 @@ Make a copy of [cluster-autoscaler-standard-master.yaml](examples/cluster-autosc
148168

149169
In the `cluster-autoscaler` spec, find the `image:` field and replace `{{ ca_version }}` with a specific cluster autoscaler release.
150170

151-
Below that, in the `command:` section, update the `--nodes=` arguments to reference your node limits and node pool name (tips: node pool name is NOT availability set name, e.g., the corresponding node pool name of the availability set
171+
Below that, in the `command:` section, update the `--nodes=` arguments to reference your node limits and node pool name (tips: node pool name is NOT availability set name, e.g., the corresponding node pool name of the availability set
152172
`agentpool1-availabilitySet-xxxxxxxx` would be `agentpool1`). For example, if node pool "k8s-nodepool-1" should scale from 1 to 10 nodes:
153173

154174
```yaml
@@ -198,7 +218,7 @@ az aks create \
198218

199219
#### AKS + Availability Set
200220

201-
The CLI based deployment only support VMSS and manual deployment is needed if availability set is used.
221+
The CLI based deployment only support VMSS and manual deployment is needed if availability set is used.
202222

203223
Prerequisites:
204224

@@ -210,7 +230,7 @@ Prerequisites:
210230
kubectl get nodes --show-labels
211231
```
212232

213-
Make a copy of [cluster-autoscaler-aks.yaml](examples/cluster-autoscaler-aks.yaml). Fill in the placeholder values for
233+
Make a copy of [cluster-autoscaler-aks.yaml](examples/cluster-autoscaler-aks.yaml). Fill in the placeholder values for
214234
the `cluster-autoscaler-azure` secret data by base64-encoding each of your Azure credential fields.
215235

216236
- ClientID: `<base64-encoded-client-id>`

0 commit comments

Comments
 (0)