Skip to content

Commit d7e172c

Browse files
authored
Don't retains member slots on nodes with nofailover tag (patroni#3169)
Followup on patroni#3142
1 parent 87cb748 commit d7e172c

File tree

3 files changed

+36
-18
lines changed

3 files changed

+36
-18
lines changed

features/permanent_slots.feature

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,13 @@ Feature: permanent slots
77
Then I receive a response code 200
88
And Response on GET http://127.0.0.1:8008/config contains slots after 10 seconds
99
When I start postgres-1
10-
And I start postgres-2
10+
And I configure and start postgres-2 with a tag nofailover true
1111
And I configure and start postgres-3 with a tag replicatefrom postgres-2
1212
Then postgres-0 has a physical replication slot named test_physical after 10 seconds
1313
And postgres-0 has a physical replication slot named postgres_1 after 10 seconds
1414
And postgres-0 has a physical replication slot named postgres_2 after 10 seconds
1515
And postgres-2 has a physical replication slot named postgres_3 after 10 seconds
16+
And postgres-2 does not have a replication slot named test_physical
1617

1718
@slot-advance
1819
Scenario: check that logical permanent slots are created
@@ -24,10 +25,9 @@ Feature: permanent slots
2425
Scenario: check that permanent slots are created on replicas
2526
Given postgres-1 has a logical replication slot named test_logical with the test_decoding plugin after 10 seconds
2627
Then Logical slot test_logical is in sync between postgres-0 and postgres-1 after 10 seconds
27-
And Logical slot test_logical is in sync between postgres-0 and postgres-2 after 10 seconds
2828
And Logical slot test_logical is in sync between postgres-0 and postgres-3 after 10 seconds
2929
And postgres-1 has a physical replication slot named test_physical after 2 seconds
30-
And postgres-2 has a physical replication slot named test_physical after 2 seconds
30+
And postgres-2 does not have a replication slot named test_logical
3131
And postgres-3 has a physical replication slot named test_physical after 2 seconds
3232

3333
@slot-advance
@@ -36,9 +36,9 @@ Feature: permanent slots
3636
And postgres-1 has a physical replication slot named postgres_0 after 2 seconds
3737
And postgres-1 has a physical replication slot named postgres_2 after 2 seconds
3838
And postgres-1 has a physical replication slot named postgres_3 after 2 seconds
39-
And postgres-2 has a physical replication slot named postgres_0 after 2 seconds
39+
And postgres-2 does not have a replication slot named postgres_0
40+
And postgres-2 does not have a replication slot named postgres_1
4041
And postgres-2 has a physical replication slot named postgres_3 after 2 seconds
41-
And postgres-2 has a physical replication slot named postgres_1 after 2 seconds
4242
And postgres-3 has a physical replication slot named postgres_0 after 2 seconds
4343
And postgres-3 has a physical replication slot named postgres_1 after 2 seconds
4444
And postgres-3 has a physical replication slot named postgres_2 after 2 seconds
@@ -50,11 +50,8 @@ Feature: permanent slots
5050
And I get all changes from physical slot test_physical on postgres-0
5151
Then Logical slot test_logical is in sync between postgres-0 and postgres-1 after 10 seconds
5252
And Physical slot test_physical is in sync between postgres-0 and postgres-1 after 10 seconds
53-
And Logical slot test_logical is in sync between postgres-0 and postgres-2 after 10 seconds
54-
And Physical slot test_physical is in sync between postgres-0 and postgres-2 after 10 seconds
5553
And Logical slot test_logical is in sync between postgres-0 and postgres-3 after 10 seconds
5654
And Physical slot test_physical is in sync between postgres-0 and postgres-3 after 10 seconds
57-
And Physical slot postgres_1 is in sync between postgres-0 and postgres-2 after 10 seconds
5855
And Physical slot postgres_1 is in sync between postgres-0 and postgres-3 after 10 seconds
5956
And Physical slot postgres_3 is in sync between postgres-2 and postgres-0 after 20 seconds
6057
And Physical slot postgres_3 is in sync between postgres-2 and postgres-1 after 10 seconds
@@ -69,7 +66,7 @@ Feature: permanent slots
6966

7067
@slot-advance
7168
Scenario: check that only non-permanent member slots are written to the retain_slots in /status key
72-
And "status" key in DCS has postgres_0 in retain_slots
69+
Given "status" key in DCS has postgres_0 in retain_slots
7370
And "status" key in DCS has postgres_1 in retain_slots
7471
And "status" key in DCS has postgres_2 in retain_slots
7572
And "status" key in DCS does not have postgres_3 in retain_slots

patroni/dcs/__init__.py

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1116,16 +1116,23 @@ def _get_permanent_slots(self, postgresql: 'Postgresql', tags: Tags, role: str)
11161116

11171117
def _get_members_slots(self, name: str, role: str, nofailover: bool,
11181118
can_advance_slots: bool) -> Dict[str, Dict[str, Any]]:
1119-
"""Get physical replication slots configuration for members that sourcing from this node.
1119+
"""Get physical replication slots configuration for a given member.
11201120
1121-
If the ``replicatefrom`` tag is set on the member - we should not create the replication slot for it on
1122-
the current primary, because that member would replicate from elsewhere. We still create the slot if
1123-
the ``replicatefrom`` destination member is currently not a member of the cluster (fallback to the
1124-
primary), or if ``replicatefrom`` destination member happens to be the current primary.
1121+
There are following situations possible:
11251122
1126-
If the ``nostream`` tag is set on the member - we should not create the replication slot for it on
1127-
the current primary or any other member even if ``replicatefrom`` is set, because ``nostream`` disables
1128-
WAL streaming.
1123+
* If the ``nostream`` tag is set on the member - we should not have the replication slot for it
1124+
on the current primary or any other member even if ``replicatefrom`` is set, because
1125+
``nostream`` disables WAL streaming.
1126+
1127+
* PostgreSQL is 11 and newer and configuration allows retention of member replication slots. In this case
1128+
we want to have replication slots for every member except the case when we have ``nofailover`` tag set.
1129+
1130+
* PostgreSQL is older than 11 or configuration doesn't allow member slots retention. In this case we want:
1131+
1132+
* On primary have replication slots for all members that don't have ``replicatefrom`` tag pointing
1133+
to the existing member.
1134+
1135+
* On replica node have replication slots only for members which ``replicatefrom`` tag pointing to us.
11291136
11301137
Will log an error if:
11311138
@@ -1195,7 +1202,7 @@ def replica_filter(member: Member) -> bool:
11951202
ret[slot_name] = {'type': 'physical', 'lsn': lsn, 'expected_active': expected_active(member)}
11961203
slot_name = slot_name_from_member_name(name)
11971204
ret.update({slot: {'type': 'physical'} for slot in self.status.retain_slots
1198-
if slot not in ret and slot != slot_name})
1205+
if not nofailover and slot not in ret and slot != slot_name})
11991206

12001207
if len(ret) < len(members):
12011208
# Find which names are conflicting for a nicer error message

tests/test_slots.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -328,3 +328,17 @@ def test_advance_physical_slots(self):
328328
None, None, None)], Exception])), \
329329
patch.object(SlotsHandler, 'drop_replication_slot', Mock(return_value=(False, False))):
330330
self.s.sync_replication_slots(cluster, self.tags)
331+
332+
@patch.object(Postgresql, 'is_primary', Mock(return_value=False))
333+
@patch.object(Postgresql, 'role', PropertyMock(return_value='replica'))
334+
@patch.object(TestTags, 'tags', PropertyMock(return_value={'nofailover': True}))
335+
def test_slots_nofailover_tag(self):
336+
self.p.name = self.leadermem.name
337+
cluster = Cluster(True, ClusterConfig(1, {}, 1), self.leader,
338+
Status(0, {}, [self.leadermem.name, self.other.name, self.me.name]),
339+
[self.me, self.other, self.leadermem], None, SyncState.empty(), None, None)
340+
global_config.update(cluster)
341+
with patch.object(SlotsHandler, '_query', Mock(side_effect=[[('test_1', 'physical', 1, 12345, None, None,
342+
None, None, None)], Exception])) as mock_query:
343+
self.s.sync_replication_slots(cluster, self.tags)
344+
self.assertTrue(mock_query.call_args[0][0].startswith('SELECT slot_name, slot_type, xmin, '))

0 commit comments

Comments
 (0)