Skip to content

Commit 2953276

Browse files
authored
Failsafe mode wasn't always triggered in case of Etcd unavailability (patroni#3404)
During heartbeat cycle Patroni does two requests to Etcd: 1. get_cluster() 2. update_lock() If request fails with one Etcd node Patroni switches to another node and retries. At the same time it sets a flag that Etcd topology must be rediscovered. Rediscovery happens either after successfully completing current request or before executing the next request. In the second case etcd.EtcdException raised by topology discovery functions wasn't handled and as a result of that failsafe_mode wasn't triggered. Close patroni#3403
1 parent 5c2e0fd commit 2953276

File tree

2 files changed

+9
-1
lines changed

2 files changed

+9
-1
lines changed

patroni/dcs/etcd.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,12 @@ def api_execute(self, path: str, method: str, params: Optional[Dict[str, Any]] =
317317

318318
# Update machines_cache if previous attempt of update has failed
319319
if self._update_machines_cache:
320-
self._load_machines_cache()
320+
try:
321+
self._load_machines_cache()
322+
except etcd.EtcdException as e:
323+
# If etcd cluster isn't accessible _load_machines_cache() -> _refresh_machines_cache() may raise
324+
# etcd.EtcdException. We need to convert it to etcd.EtcdConnectionFailed for failsafe_mode to work.
325+
raise etcd.EtcdConnectionFailed('No more machines in the cluster') from e
321326
elif not self._use_proxies and time.time() - self._machines_cache_updated > self._machines_cache_ttl:
322327
self._refresh_machines_cache()
323328

tests/test_etcd.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,9 @@ def test_api_execute(self):
211211
patch.object(EtcdClient, '_load_machines_cache', Mock(return_value=True)):
212212
self.assertRaises(etcd.EtcdException, rtry, self.client.api_execute, '/', 'GET', params={'retry': rtry})
213213

214+
with patch.object(EtcdClient, '_get_machines_list', Mock(side_effect=etcd.EtcdConnectionFailed)):
215+
self.assertRaises(etcd.EtcdConnectionFailed, self.client.api_execute, '/', 'GET')
216+
214217
with patch.object(EtcdClient, '_do_http_request', Mock(side_effect=etcd.EtcdException)):
215218
self.client._read_timeout = 0.01
216219
self.assertRaises(etcd.EtcdException, self.client.api_execute, '/', 'GET')

0 commit comments

Comments
 (0)