Skip to content

Commit 83458b3

Browse files
committed
storage: introduce on_master_enable service
Before this patch the `rebalancer` and `recovery` service could start just right after master switch (by `auto` master detection or manual reconfiguration) before the master had time to sync its vclock with other replicas in replicaset. It could lead to doubled buckets according to "Doubled buckets RFC". To fix it we introduce a new storage service - `on_master_enable` service. If master is changed in replicaset, this service is triggered and waits until newly elected master syncs its vclock with other replicas. Other storage services - `rebalancer` and `recovery` can't start until `on_master_enable` set `M.buckets_are_in_sync`. Also we change `storage/storage.test`, `storage/recovery.test`, `storage-luatest/log_verbosity_2_2_test` and `router/router.test` so that they wouldn't failed. Now `rebalancer` and `recovery` services don't start immediately after master switch and it can shake some tests. Part of #214 NO_TEST=bugfix NO_DOC=bugfix
1 parent 05b71b0 commit 83458b3

File tree

12 files changed

+572
-5
lines changed

12 files changed

+572
-5
lines changed

test/router/router.result

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -435,9 +435,25 @@ vshard.router.buckets_info(0, 3)
435435
_ = test_run:cmd('start server storage_2_a')
436436
---
437437
...
438+
_ = test_run:switch('storage_2_a')
439+
---
440+
...
441+
test_run:wait_log('storage_2_a', 'New master has synchronized with other replicas')
442+
---
443+
- New master has synchronized with other replicas
444+
...
445+
vshard.storage.rebalancer_wakeup()
446+
---
447+
...
448+
vshard.storage.recovery_wakeup()
449+
---
450+
...
438451
--
439452
-- gh-26: API to get netbox by bucket identifier.
440453
--
454+
_ = test_run:switch('router_1')
455+
---
456+
...
441457
vshard.router.route(vshard.consts.DEFAULT_BUCKET_COUNT + 100)
442458
---
443459
- null

test/router/router.test.lua

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,10 +162,14 @@ end)
162162
util.check_error(vshard.router.call, 1, 'read', 'echo', {123})
163163
vshard.router.buckets_info(0, 3)
164164
_ = test_run:cmd('start server storage_2_a')
165-
165+
_ = test_run:switch('storage_2_a')
166+
test_run:wait_log('storage_2_a', 'New master has synchronized with other replicas')
167+
vshard.storage.rebalancer_wakeup()
168+
vshard.storage.recovery_wakeup()
166169
--
167170
-- gh-26: API to get netbox by bucket identifier.
168171
--
172+
_ = test_run:switch('router_1')
169173
vshard.router.route(vshard.consts.DEFAULT_BUCKET_COUNT + 100)
170174
util.check_error(vshard.router.route, 'asdfg')
171175
util.check_error(vshard.router.route)

test/storage-luatest/log_verbosity_2_2_test.lua

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ test_group.test_rebalancer_do_not_spam_same_errors = function(g)
119119
end
120120
end)
121121
local msg = "Error during downloading rebalancer states"
122-
g.replica_1_a:wait_log_exactly_once(msg, {timeout = 0.1,
122+
g.replica_1_a:wait_log_exactly_once(msg, {timeout = 1,
123123
on_yield = function() ivshard.storage.rebalancer_wakeup() end})
124124
g.replica_2_a:exec(function()
125125
ivshard.storage.rebalancer_request_state = _G.old_call

0 commit comments

Comments
 (0)