Skip to content

Commit 98c4ae5

Browse files
SeanMooneykarelyatin
authored andcommitted
Send ovn heatbeat more often.
This change modifies the metadata agent heatbeat to use a random offset with a max delay of 10 seconds. The orgial reason for the current logic was to mitigate https://bugs.launchpad.net/neutron/+bug/1991817 so the logic to spread the heatbeats is maintained but we now set an upper bound on the delay. Close-Bug: #2020215 Change-Id: I4d382793255520b9c44ca2aaacebcbda9a432dde (cherry picked from commit 5e0c102)
1 parent 6eaa2a0 commit 98c4ae5

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

neutron/agent/ovn/metadata/agent.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -234,14 +234,21 @@ def _update_chassis(self, row):
234234
ovn_const.OVN_AGENT_METADATA_SB_CFG_KEY:
235235
str(row.nb_cfg)})).execute()
236236

237+
delay = 0
237238
if self.first_run:
238-
interval = 0
239239
self.first_run = False
240240
else:
241-
interval = randint(0, cfg.CONF.agent_down_time // 2)
242-
243-
LOG.debug("Delaying updating chassis table for %s seconds", interval)
244-
timer = threading.Timer(interval, _update_chassis, [self, row])
241+
# We occasionally see port binding failed errors due to
242+
# the ml2 driver refusing to bind the port to a dead agent.
243+
# if all agents heartbeat at the same time, they will all
244+
# cause a load spike on the server. To mitigate that we
245+
# need to spread out the load by introducing a random delay.
246+
# clamp the max delay between 3 and 10 seconds.
247+
max_delay = max(min(cfg.CONF.agent_down_time // 3, 10), 3)
248+
delay = randint(0, max_delay)
249+
250+
LOG.debug("Delaying updating chassis table for %s seconds", delay)
251+
timer = threading.Timer(delay, _update_chassis, [self, row])
245252
timer.start()
246253

247254

0 commit comments

Comments
 (0)