Skip to content

Commit 1b72854

Browse files
phlogistonjohnmergify[bot]
authored andcommitted
sambacc: add a retry loop to ctdb.monitor_cluster_meta_changes
Add a loop that tries the `ctdb reloadnodes` command after an increasing delay. This is an attempt to fix a condition where ctdbd is apparently not ready to handle the `ctdb reloadnodes` command. In this case the command would be run, but fail and an exception would be raised in the monitor_cluster_meta_changes function would raise an exception. This would be caught by the command-level retry loop. However, this command-level retry loop will simply re-run monitor_cluster_meta_changes and this function now no longer has the same initial clustermeta state and has effectively "forgotten" that it needs to run reloadnodes. This new retry loop adds a level of error handling inside the monitor_cluster_meta_changes function so that we will retry with a bounded number of attempts. Signed-off-by: John Mulligan <[email protected]>
1 parent 406afac commit 1b72854

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

sambacc/ctdb.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
import logging
2121
import os
2222
import subprocess
23+
import time
2324
import typing
2425

2526
from sambacc import config
@@ -572,7 +573,23 @@ def monitor_cluster_meta_changes(
572573
if nodes_file_path:
573574
_logger.info("updating nodes file: %s", nodes_file_path)
574575
_save_nodes(nodes_file_path, expected_nodes)
575-
_maybe_reload_nodes(leader_locator, reload_all=reload_all)
576+
_maybe_reload_nodes_retry(leader_locator, reload_all=reload_all)
577+
578+
579+
def _maybe_reload_nodes_retry(
580+
leader_locator: typing.Optional[leader.LeaderLocator] = None,
581+
reload_all: bool = False,
582+
*,
583+
tries: int = 5,
584+
) -> None:
585+
for idx in range(tries):
586+
time.sleep(1 << idx)
587+
try:
588+
_maybe_reload_nodes(leader_locator, reload_all=reload_all)
589+
return
590+
except subprocess.CalledProcessError:
591+
_logger.exception("failed to execute reload nodes command")
592+
raise RuntimeError("exceeded retries running reload nodes command")
576593

577594

578595
def _maybe_reload_nodes(

0 commit comments

Comments
 (0)