Skip to content

Commit ec2e477

Browse files
committed
Azure: keep refreshes spread over time
When a `vmssVmsCacheJitter` is provided, API calls (after start) will be randomly spread over the provided time range, then happens at regular interval (for a given VMSS). This prevents API calls spikes. But we noticed that the various VMSS' refreshes will progressively converge and agglomerate over time (in particular after a few large throttling windows affected the autoscaler), which defeats the purpose. Re-randomizing the next refresh deadline every time (rather than just at autoscaler start) keeps the calls properly spread. Configuring `vmssVmsCacheJitter` and `vmssVmsCacheTTL` allows users to control the average and worst case refresh interval (and avg API call rate). And we can count on VMSS size change detection to kick early refreshes when needed. That's a small behaviour change, but possibly still a good time for that, as `vmssVmsCacheJitter` was introduced recently and wasn't part of any release yet.
1 parent 0e8e609 commit ec2e477

File tree

1 file changed

+3
-8
lines changed

1 file changed

+3
-8
lines changed

cluster-autoscaler/cloudprovider/azure/azure_scale_set.go

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -562,18 +562,13 @@ func (scaleSet *ScaleSet) Nodes() ([]cloudprovider.Instance, error) {
562562
}
563563

564564
klog.V(4).Infof("Nodes: starts to get VMSS VMs")
565-
566-
lastRefresh := time.Now()
567-
if scaleSet.lastInstanceRefresh.IsZero() && scaleSet.instancesRefreshJitter > 0 {
568-
// new VMSS: spread future refreshs
569-
splay := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(scaleSet.instancesRefreshJitter + 1)
570-
lastRefresh = time.Now().Add(-time.Second * time.Duration(splay))
571-
}
565+
splay := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(scaleSet.instancesRefreshJitter + 1)
566+
lastRefresh := time.Now().Add(-time.Second * time.Duration(splay))
572567

573568
vms, rerr := scaleSet.GetScaleSetVms()
574569
if rerr != nil {
575570
if isAzureRequestsThrottled(rerr) {
576-
// Log a warning and update the instance refresh time so that it would retry after next scaleSet.instanceRefreshPeriod.
571+
// Log a warning and update the instance refresh time so that it would retry after cache expiration
577572
klog.Warningf("GetScaleSetVms() is throttled with message %v, would return the cached instances", rerr)
578573
scaleSet.lastInstanceRefresh = lastRefresh
579574
return scaleSet.instanceCache, nil

0 commit comments

Comments
 (0)