Skip to content

Commit 39f5de2

Browse files
authored
Implement sync_priority tag (patroni#3223)
1 parent 46e20ed commit 39f5de2

File tree

15 files changed

+175
-81
lines changed

15 files changed

+175
-81
lines changed

docs/yaml_configuration.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -401,8 +401,9 @@ Tags
401401
- **noloadbalance**: ``true`` or ``false``. If set to ``true`` the node will return HTTP Status Code 503 for the ``GET /replica`` REST API health-check and therefore will be excluded from the load-balancing. Defaults to ``false``.
402402
- **replicatefrom**: The name of another replica to replicate from. Used to support cascading replication.
403403
- **nosync**: ``true`` or ``false``. If set to ``true`` the node will never be selected as a synchronous replica.
404+
- **sync_priority**: integer, controls the priority this node should have during synchronous replica selection when ``synchronous_mode`` is set to ``on``. Nodes with higher priority will be preferred over lower-priority nodes. If the ``sync_priority`` is 0 or negative - such node is not allowed to be written to ``synchronous_standby_names`` PostgreSQL parameter (similar to ``nosync: true``). Keep in mind, that this parameter has the opposite meaning to ``sync_priority`` value reported in ``pg_stat_replication`` view.
404405
- **nofailover**: ``true`` or ``false``, controls whether this node is allowed to participate in the leader race and become a leader. Defaults to ``false``, meaning this node _can_ participate in leader races.
405-
- **failover_priority**: integer, controls the priority that this node should have during failover. Nodes with higher priority will be preferred over lower priority nodes if they received/replayed the same amount of WAL. However, nodes with higher values of receive/replay LSN are preferred regardless of their priority. If the ``failover_priority`` is 0 or negative - such node is not allowed to participate in the leader race and to become a leader (similar to ``nofailover: true``).
406+
- **failover_priority**: integer, controls the priority this node should have during failover. Nodes with higher priority will be preferred over lower-priority nodes if they received/replayed the same amount of WAL. However, nodes with higher values of receive/replay LSN are preferred regardless of their priority. If the ``failover_priority`` is 0 or negative - such node is not allowed to participate in the leader race and to become a leader (similar to ``nofailover: true``).
406407
- **nostream**: ``true`` or ``false``. If set to ``true`` the node will not use replication protocol to stream WAL. It will rely instead on archive recovery (if ``restore_command`` is configured) and ``pg_wal``/``pg_xlog`` polling. It also disables copying and synchronization of permanent logical replication slots on the node itself and all its cascading replicas. Setting this tag on primary node has no effect.
407408

408409
.. warning::

features/priority_sync.feature

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
Feature: synchronous replicas priority
2+
We should check that we can give nodes priority for becoming synchronous replicas
3+
4+
Scenario: check replica with sync_priority=0 does not become a synchronous replica
5+
Given I start postgres-0
6+
When I issue a PATCH request to http://127.0.0.1:8008/config with {"ttl": 20, "synchronous_mode": true}
7+
Then I receive a response code 200
8+
When I configure and start postgres-1 with a tag sync_priority 0
9+
Then sync key in DCS has leader=postgres-0 after 20 seconds
10+
And sync key in DCS has sync_standby=None after 5 seconds
11+
12+
Scenario: check higher synchronous replicas priority is respected
13+
Given I configure and start postgres-2 with a tag sync_priority 1
14+
And I configure and start postgres-3 with a tag sync_priority 2
15+
Then replication works from postgres-0 to postgres-2 after 20 seconds
16+
And replication works from postgres-0 to postgres-3 after 20 seconds
17+
When I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_node_count": 1}
18+
Then I receive a response code 200
19+
And sync key in DCS has sync_standby=postgres-3 after 10 seconds
20+
21+
22+
Scenario: check conflicting configuration handling
23+
When I set nosync tag in postgres-3 config
24+
And I issue an empty POST request to http://127.0.0.1:8011/reload
25+
Then I receive a response code 202
26+
And there is one of ["Conflicting configuration between nosync: True and sync_priority: 2. Defaulting to nosync: True"] WARNING in the postgres-3 patroni log after 5 seconds
27+
And "members/postgres-3" key in DCS has tags={'nosync': True, 'sync_priority': '2'} after 10 seconds
28+
And "sync" key in DCS has sync_standby=postgres-2 after 10 seconds
29+
When I reset nosync tag in postgres-1 config
30+
And I issue an empty POST request to http://127.0.0.1:8009/reload
31+
Then I receive a response code 202
32+
And there is one of ["Conflicting configuration between nosync: False and sync_priority: 0. Defaulting to nosync: False"] WARNING in the postgres-1 patroni log after 5 seconds
33+
And "members/postgres-1" key in DCS has tags={'nosync': False, 'sync_priority': '0'} after 10 seconds
34+
When I shut down postgres-2
35+
And "sync" key in DCS has sync_standby=postgres-1 after 3 seconds
36+

patroni/config.py

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def __init__(self, configfile: str,
147147
self._cache_file = os.path.join(self._data_dir, self.__CACHE_FILENAME)
148148
if validator: # patronictl uses validator=None
149149
self._load_cache() # we don't want to load anything from local cache for ctl
150-
self._validate_failover_tags() # irrelevant for ctl
150+
self._validate_contradictory_tags() # irrelevant for ctl
151151
self._cache_needs_saving = False
152152

153153
@property
@@ -359,7 +359,7 @@ def reload_local_configuration(self) -> Optional[bool]:
359359
new_configuration = self._build_effective_configuration(self._dynamic_configuration, configuration)
360360
self._local_configuration = configuration
361361
self.__effective_configuration = new_configuration
362-
self._validate_failover_tags()
362+
self._validate_contradictory_tags()
363363
return True
364364
else:
365365
logger.info('No local configuration items changed.')
@@ -798,25 +798,31 @@ def copy(self) -> Dict[str, Any]:
798798
"""
799799
return deepcopy(self.__effective_configuration)
800800

801-
def _validate_failover_tags(self) -> None:
802-
"""Check ``nofailover``/``failover_priority`` config and warn user if it's contradictory.
801+
def _validate_contradictory_tags(self) -> None:
802+
"""Check boolean/priority tags' config and warn user if it's contradictory.
803803
804804
.. note::
805-
To preserve sanity (and backwards compatibility) the ``nofailover`` tag will still exist. A contradictory
806-
configuration is one where ``nofailover`` is ``True`` but ``failover_priority > 0``, or where
807-
``nofailover`` is ``False``, but ``failover_priority <= 0``. Essentially, ``nofailover`` and
808-
``failover_priority`` are communicating different things.
805+
To preserve sanity (and backwards compatibility) the ``nofailover``/``nosync`` tag will still exist.
806+
A contradictory configuration is one where ``nofailover``/``nosync`` is ``True`` but
807+
``failover_priority > 0``/``sync_priority > 0``, or where ``nofailover``/``nosync`` is ``False``,
808+
but ``failover_priority <= 0``/``sync_priority <= 0``. Essentially, ``nofailover``/``nosync`` and
809+
``failover_priority``/``sync_priority`` are communicating different things.
809810
This checks for this edge case (which is a misconfiguration on the part of the user) and warns them.
810-
The behaviour is as if ``failover_priority`` were not provided (i.e ``nofailover`` is the
811-
bedrock source of truth)
811+
The behaviour is as if ``failover_priority``/``sync_priority`` were not provided
812+
(i.e ``nofailover``/``nosync`` is the bedrock source of truth).
812813
"""
813814
tags = self.get('tags', {})
814-
if 'nofailover' not in tags:
815-
return
816-
nofailover_tag = tags.get('nofailover')
817-
failover_priority_tag = parse_int(tags.get('failover_priority'))
818-
if failover_priority_tag is not None \
819-
and (bool(nofailover_tag) is True and failover_priority_tag > 0
820-
or bool(nofailover_tag) is False and failover_priority_tag <= 0):
821-
logger.warning('Conflicting configuration between nofailover: %s and failover_priority: %s. '
822-
'Defaulting to nofailover: %s', nofailover_tag, failover_priority_tag, nofailover_tag)
815+
816+
def validate_tag(bool_name: str, priority_name: str) -> None:
817+
if bool_name not in tags:
818+
return
819+
bool_tag = tags.get(bool_name)
820+
priority_tag = parse_int(tags.get(priority_name))
821+
if priority_tag is not None \
822+
and (bool(bool_tag) is True and priority_tag > 0
823+
or bool(bool_tag) is False and priority_tag <= 0):
824+
logger.warning('Conflicting configuration between %s: %s and %s: %s. Defaulting to %s: %s',
825+
bool_name, bool_tag, priority_name, priority_tag, bool_name, bool_tag)
826+
827+
validate_tag('nofailover', 'failover_priority')
828+
validate_tag('nosync', 'sync_priority')

patroni/config_generator.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,7 @@ def get_template_config(cls) -> Dict[str, Any]:
126126
},
127127
'tags': {
128128
'failover_priority': 1,
129+
'sync_priority': 1,
129130
'noloadbalance': False,
130131
'clonefrom': True,
131132
'nosync': False,

patroni/ha.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,7 @@ def get_effective_tags(self) -> Dict[str, Any]:
410410
# _disable_sync could be modified concurrently, but we don't care as attribute get and set are atomic.
411411
if self._disable_sync > 0:
412412
tags['nosync'] = True
413+
tags['sync_priority'] = 0
413414
return tags
414415

415416
def notify_mpp_coordinator(self, event: str) -> None:

patroni/postgresql/sync.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@ class _Replica(NamedTuple):
196196
sync_state: str
197197
lsn: int
198198
nofailover: bool
199+
sync_priority: int
199200

200201

201202
class _ReplicaList(List[_Replica]):
@@ -238,10 +239,12 @@ def __init__(self, postgresql: 'Postgresql', cluster: Cluster) -> None:
238239
# b. PostgreSQL on the member is known to be running and accepting client connections.
239240
if member and row[sort_col] is not None and member.is_running and not member.nosync:
240241
self.append(_Replica(row['pid'], row['application_name'],
241-
row['sync_state'], row[sort_col], bool(member.nofailover)))
242+
row['sync_state'], row[sort_col],
243+
bool(member.nofailover), member.sync_priority))
242244

243-
# Prefer replicas that are in state ``sync`` and with higher values of ``write``/``flush``/``replay`` LSN.
244-
self.sort(key=lambda r: (r.sync_state, r.lsn), reverse=True)
245+
# Prefer replicas with higher ``sync_priority`` value, in state ``sync``,
246+
# and higher values of ``write``/``flush``/``replay`` LSN.
247+
self.sort(key=lambda r: (r.sync_priority, r.sync_state, r.lsn), reverse=True)
245248

246249
# When checking ``maximum_lag_on_syncnode`` we want to compare with the most
247250
# up-to-date replica otherwise with cluster LSN if there is only one replica.

patroni/tags.py

Lines changed: 45 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -29,14 +29,16 @@ def _filter_tags(tags: Dict[str, Any]) -> Dict[str, Any]:
2929
they all are boolean values that default to disabled.
3030
However ``nofailover`` tag is always returned if ``failover_priority`` tag is defined. In this case, we need
3131
both values to see if they are contradictory and the ``nofailover`` value should be used.
32+
The same rule applies for ``nosync`` and ``sync_priority`` tags.
3233
3334
:returns: a dictionary of tags set for this node. The key is the tag name, and the value is the corresponding
3435
tag value.
3536
"""
3637
return {tag: value for tag, value in tags.items()
3738
if any((tag not in ('clonefrom', 'nofailover', 'noloadbalance', 'nosync', 'nostream'),
3839
value,
39-
tag == 'nofailover' and 'failover_priority' in tags))}
40+
tag == 'nofailover' and 'failover_priority' in tags,
41+
tag == 'nosync' and 'sync_priority' in tags))}
4042

4143
@property
4244
@abc.abstractmethod
@@ -52,31 +54,49 @@ def clonefrom(self) -> bool:
5254
"""``True`` if ``clonefrom`` tag is ``True``, else ``False``."""
5355
return self.tags.get('clonefrom', False)
5456

55-
@property
56-
def nofailover(self) -> bool:
57-
"""Common logic for obtaining the value of ``nofailover`` from ``tags`` if defined.
57+
def _priority_tag(self, bool_name: str, priority_name: str) -> int:
58+
"""Common logic for obtaining the value of a priority tag from ``tags`` if defined.
59+
60+
If boolean tag is defined as ``True``, this will return ``0``. Otherwise, it will return the value of
61+
the respective priority tag, defaulting to ``1`` if it's not defined or invalid.
62+
63+
:param bool_name: name of the boolean tag (``nofailover``. ``nosync``).
64+
:param priority_name: name of the priority tag (``failover_priority``, ``sync_priority``).
5865
59-
If ``nofailover`` is not defined, this methods returns ``True`` if ``failover_priority`` is non-positive,
66+
:returns: integer value based on the defined tags.
67+
"""
68+
from_tags = self.tags.get(bool_name)
69+
priority = parse_int(self.tags.get(priority_name))
70+
priority = 1 if priority is None else priority
71+
return 0 if from_tags else priority
72+
73+
def _bool_tag(self, bool_name: str, priority_name: str) -> bool:
74+
"""Common logic for obtaining the value of a boolean tag from ``tags`` if defined.
75+
76+
If boolean tag is not defined, this methods returns ``True`` if priority tag is non-positive,
6077
``False`` otherwise.
78+
79+
:param bool_name: name of the boolean tag (``nofailover``. ``nosync``).
80+
:param priority_name: name of the priority tag (``failover_priority``, ``sync_priority``).
81+
82+
:returns: boolean value based on the defined tags.
6183
"""
62-
from_tags = self.tags.get('nofailover')
84+
from_tags = self.tags.get(bool_name)
6385
if from_tags is not None:
64-
# Value of `nofailover` takes precedence over `failover_priority`
86+
# Value of bool tag takes precedence over priority tag
6587
return bool(from_tags)
66-
failover_priority = parse_int(self.tags.get('failover_priority'))
67-
return failover_priority is not None and failover_priority <= 0
88+
priority = parse_int(self.tags.get(priority_name))
89+
return priority is not None and priority <= 0
6890

6991
@property
70-
def failover_priority(self) -> int:
71-
"""Common logic for obtaining the value of ``failover_priority`` from ``tags`` if defined.
92+
def nofailover(self) -> bool:
93+
"""``True`` if node configuration doesn't allow it to become primary, ``False`` otherwise."""
94+
return self._bool_tag('nofailover', 'failover_priority')
7295

73-
If ``nofailover`` is defined as ``True``, this will return ``0``. Otherwise, it will return the value of
74-
``failover_priority``, defaulting to ``1`` if it's not defined or invalid.
75-
"""
76-
from_tags = self.tags.get('nofailover')
77-
failover_priority = parse_int(self.tags.get('failover_priority'))
78-
failover_priority = 1 if failover_priority is None else failover_priority
79-
return 0 if from_tags else failover_priority
96+
@property
97+
def failover_priority(self) -> int:
98+
"""Value of ``failover_priority`` from ``tags`` if defined, otherwise derived from ``nofailover``."""
99+
return self._priority_tag('nofailover', 'failover_priority')
80100

81101
@property
82102
def noloadbalance(self) -> bool:
@@ -85,8 +105,13 @@ def noloadbalance(self) -> bool:
85105

86106
@property
87107
def nosync(self) -> bool:
88-
"""``True`` if ``nosync`` is ``True``, else ``False``."""
89-
return bool(self.tags.get('nosync', False))
108+
"""``True`` if node configuration doesn't allow it to become synchronous, ``False`` otherwise."""
109+
return self._bool_tag('nosync', 'sync_priority')
110+
111+
@property
112+
def sync_priority(self) -> int:
113+
"""Value of ``sync_priority`` from ``tags`` if defined, otherwise derived from ``nosync``."""
114+
return self._priority_tag('nosync', 'sync_priority')
90115

91116
@property
92117
def replicatefrom(self) -> Optional[str]:

patroni/validator.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1182,7 +1182,10 @@ def validate_watchdog_mode(value: Any) -> None:
11821182
Optional("clonefrom"): bool,
11831183
Optional("noloadbalance"): bool,
11841184
Optional("replicatefrom"): str,
1185-
Optional("nosync"): bool,
1185+
AtMostOne("nosync", "sync_priority"): Case({
1186+
"nosync": bool,
1187+
"sync_priority": IntValidator(min=0, expected_type=int, raise_assert=True),
1188+
}),
11861189
Optional("nostream"): bool
11871190
}
11881191
})

postgres0.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ postgresql:
133133

134134
tags:
135135
# failover_priority: 1
136+
# sync_priority: 1
136137
noloadbalance: false
137138
clonefrom: false
138-
nosync: false
139139
nostream: false

postgres1.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,5 +125,6 @@ postgresql:
125125

126126
tags:
127127
# failover_priority: 1
128+
# sync_priority: 1
128129
noloadbalance: false
129130
clonefrom: false

0 commit comments

Comments
 (0)