storage: introduce on_master_enable service#646

Open

mrForza wants to merge 3 commits intotarantool:masterfrom

mrForza:gh-214-stray-tcp-doubled-buckets

Contributor

mrForza commented Mar 20, 2026 •

edited

Loading

Before this patch the rebalancer and recovery service could start
just right after master switch (by auto master detection or manual
reconfiguration) before the master had time to sync its vclock with
other replicas in replicaset. It could lead to doubled buckets according
to "Doubled buckets RFC".

To fix it we introduce a new storage service - on_master_enable
service. If master is changed in replicaset, this service is triggered
and waits until newly elected master syncs its vclock with other
replicas. Other storage services - rebalancer and recovery can't
start until on_master_enable set M.buckets_are_in_sync.

Closes #214

NO_TEST=bugfix
NO_DOC=bugfix

mrForza force-pushed the gh-214-stray-tcp-doubled-buckets branch 2 times, most recently from be90d04 to 2f96b14 Compare

March 21, 2026 11:57

Serpentian requested changes

View reviewed changes

vshard/storage/init.lua Outdated Show resolved Hide resolved

vshard/storage/init.lua Outdated Show resolved Hide resolved

vshard/storage/init.lua Outdated

+                      if not M.on_master_enable_fiber or
+                             M.on_master_enable_fiber:status() == 'dead' then
+                          M.on_master_enable_fiber =
+                              util.reloadable_fiber_new('vshard.on_master_enable',

Collaborator

Serpentian Mar 23, 2026

We don't need reloadable fiber here. Please, take a loot at the utll.reloadable_fiber_new implemetation and to the M.module_version, it doesn't make sense, when you don't have:

    while M.module_version == module_version do

Contributor Author

mrForza Mar 25, 2026 •

edited

Loading

Let's leave it as that. If we create a service fiber without using reloadable_fiber_new we can face with reload_evoluation/storage.test failure.

Also all router's or storage's services use reloadable_fiber_new, may be it is good to make on_master_enable fiber creation consistent with other services.

Collaborator

Serpentian Mar 26, 2026

You won't need that crutchy cancel of the self fiber if the fiber is not reloadable.

Let's leave it as that. If we create a service fiber without using reloadable_fiber_new we can face with reload_evoluation/storage.test failure.

Why does it fail?

Also all router's or storage's services use reloadable_fiber_new, may be it is good to make on_master_enable fiber creation consistent with other services.

They are constantly working in the loop services, our new service - is not.

vshard/consts.lua Outdated Show resolved Hide resolved

vshard/storage/init.lua Show resolved Hide resolved

vshard/storage/init.lua Show resolved Hide resolved

Serpentian assigned mrForza

mrForza force-pushed the gh-214-stray-tcp-doubled-buckets branch 2 times, most recently from 83458b3 to 396cc20 Compare

March 25, 2026 13:08

mrForza requested a review from Serpentian

March 25, 2026 14:43

mrForza assigned Serpentian and unassigned mrForza

Serpentian requested changes

View reviewed changes

test/luatest_helpers/vtest.lua Outdated

                   vardir = vardir,
                   clear_test_cfg_options = clear_test_cfg_options,
                   info_assert_alert = info_assert_alert,
+                  bucket_move = bucket_move,

Collaborator

Serpentian Mar 26, 2026

Please, prefix the functions with storage_. All storage related functions are named as that

Contributor Author

mrForza Mar 31, 2026

fixed

test/luatest_helpers/vtest.lua Outdated

+              local function bucket_move(src_storage, dest_storage, bucket_id)
+                  src_storage:exec(function(bucket_id, replicaset_id)
+                      t.helpers.retrying({timeout = 60}, function()

Collaborator

Serpentian Mar 26, 2026

wait_timeout is the default for such functions, no need to hardcode the 60. Same in the bucket_wait_transfer function

Contributor Author

mrForza Mar 31, 2026

fixed

test/luatest_helpers/vtest.lua Outdated

+              local function bucket_wait_transfer(src_storage, dest_storage, bucket_id)
+                  src_storage:exec(function(bucket_id)
+                      t.helpers.retrying({timeout = 10}, function()
+                          t.assert_equals(box.space._bucket:select(bucket_id), {})

Collaborator

Serpentian Mar 26, 2026

Nit: get will be better, you don't need to select over unique primary key

Contributor Author

mrForza Mar 31, 2026

fixed

test/luatest_helpers/vtest.lua

		@@ -846,6 +846,31 @@ local function info_assert_alert(alerts, alert_name)
		t.fail(('There is no %s in alerts').format(alert_name))

Collaborator

Serpentian Mar 26, 2026

Nit: first and second commits are not refactoring (reason for no test and doc). Refactoring is related to the vshard code refactoring and these are just test reason.

Contributor Author

mrForza Mar 31, 2026

fixed

test/luatest_helpers/vtest.lua Outdated

                   info_assert_alert = info_assert_alert,
                   bucket_move = bucket_move,
                   bucket_wait_transfer = bucket_wait_transfer,
+                  storage_wait_pairsync = storage_wait_pairsync,

Collaborator

Serpentian Mar 26, 2026

Can't you use vtest.cluster_wait_fullsync which is already exported?

Contributor Author

mrForza Mar 31, 2026

Yes, agree

vshard/storage/init.lua Outdated

    
                          if not down or (down.status == 'stopped' or

                                          not vclock_lesseq(vclock, down.vclock)) then

                          if not down or down.status == 'stopped' or

                              not util.vclock_compare(vclock, down.vclock, comparator) then

Collaborator

Serpentian Mar 26, 2026

We're calling the function, which is not defined in the current commit

Contributor Author

mrForza Mar 31, 2026

fixed

vshard/storage/init.lua Outdated

                       for _, replica in ipairs(box.info.replication) do
+                          -- The current vclock may be changed between iterations. We need to
+                          -- track the most recent one.
+                          local vclock = box.info.vclock

Collaborator

Serpentian Mar 26, 2026

We cannot use such function in the vshard.storage.sync. The function is supposed to wait, until all changes from the current node are on all other instances, if some instances are lagging and the current node constantly writes, the sync will never exit, since we constantly update the vclock.

However, this approach can be used in the newly created service, since we expect the service to be started on the master and all other nodes cannot write new transactions, so sooner or later it will end.

I don't see any good approaches to reuse the wait_lsn function in all places:

Vclock cannot be updated on every iteration, when instance becomes leader it must synchronously wait for old service to die before starting the new one. I don't like the synchronous waiting part here.
Vclock becomes an argument of the wait_lsn.The service constantly retries the wait_lsn part until success by passing the current vclock. In that solution there's no sense in wait_ part, since we will have to do the wait_lsn with really small timeout

Instead, I propose to move the loop iteration from the wait_lsn to the separate function, pass comparator and vclock there and use it in the wait_lsn and your newly created function. In wait_lsn we'll pass same vclock on every iter, in the new service - box.info.vclock (updated on every iter)

Contributor Author

mrForza Mar 31, 2026

I fixed this issue with minimal changes. Now we can pass a comparable vclock in storage_wait_vclock_template. If vclock is passed we will use it in comparison with downstream.vclock otherwise we will use box.info.vclock of current storage on every loop iteration.

vshard/storage/init.lua Outdated

+                      if not M.on_master_enable_fiber or
+                             M.on_master_enable_fiber:status() == 'dead' then
+                          M.on_master_enable_fiber =
+                              util.reloadable_fiber_new('vshard.on_master_enable',

Collaborator

Serpentian Mar 26, 2026

You won't need that crutchy cancel of the self fiber if the fiber is not reloadable.

Let's leave it as that. If we create a service fiber without using reloadable_fiber_new we can face with reload_evoluation/storage.test failure.

Why does it fail?

Also all router's or storage's services use reloadable_fiber_new, may be it is good to make on_master_enable fiber creation consistent with other services.

They are constantly working in the loop services, our new service - is not.

vshard/storage/init.lua

                           M.recovery_fiber =
                               util.reloadable_fiber_new('vshard.recovery', M, 'recovery_f')
                       end
                   else

Collaborator

Serpentian Mar 26, 2026

It's not guaranteed, that the on_master_enable_fiber will wakeup sooner than all other rebalancer related fibers, so it may happen, that when they'll start the variable buckets_are_in_sync will still be true due to the old check. I'd expect the variable to be set to false, if instance becomes non master.

You can easily test it with manual wakeups of the fibers if you want to.

Contributor Author

mrForza Mar 31, 2026

done

Serpentian assigned mrForza and unassigned Serpentian

mrForza added 3 commits

March 31, 2026 17:48


          test: move bucket_move and bucket_wait_transfer into vtest

f94b977

Before this patch the `bucket_move` and `bucket_wait_transfer` helper
functions were used only in `storage_1_1_1_test`. However in future
patches these helpers can also be applicable (e.g. in tarantoolgh-214).

This patch moves `bucket_move` and `bucket_wait_transfer` into `vtest`
module so that we can use it in other tests.

Needed for tarantool#214

NO_TEST=test
NO_DOC=test


          storage: refactoring of wait_lsn logic

675387b

Before this patch we compared vclocks only in `wait_lsn` function in
storage module. However in future patches (e.g. tarantoolgh-214) we will need to
do this even in tests. Also in tarantoolgh-214 we will use very similar logic of
waiting vclocks but with different sign (all vclock components of
current storage should be "greater or equal" than components of replicas'
vclocks instead of "less or equal")

To avoid duplication of code we unify the process of vclocks' comparison
and transform `vclock_lesseq` into more general `vclock_compare` function
which can allow us to make different comparisons of vclocks by
comparator. We move this function in `util` vshard module.

Also we transform `wait_lsn` into `storage_wait_vclock_replicated`. This
function does the similar thing like `wait_lsn`, but the main logic has
migrated into `storage_wait_vclock_template` which is responsible for
waiting for passed vclock will satisfy the comparator condition.

Needed for tarantool#214

NO_TEST=refactoring
NO_DOC=refactoring


          storage: introduce on_master_enable service

78cf3e9

Before this patch the `rebalancer` and `recovery` service could start
just right after master switch (by `auto` master detection or manual
reconfiguration) before the master had time to sync its vclock with
other replicas in replicaset. It could lead to doubled buckets according
to "Doubled buckets RFC".

To fix it we introduce a new storage service - `on_master_enable`
service. If master is changed in replicaset, this service is triggered
and waits until newly elected master syncs its vclock with other
replicas. Other storage services - `rebalancer` and `recovery` can't
start until `on_master_enable` set `M.buckets_are_in_sync`.

Also we change `storage/storage.test`, `storage/recovery.test`,
`storage-luatest/log_verbosity_2_2_test` and `router/router.test` so
that they wouldn't failed. Now `rebalancer` and `recovery` services
don't start immediately after master switch and it can shake some tests.

Part of tarantool#214

NO_TEST=bugfix
NO_DOC=bugfix

mrForza force-pushed the gh-214-stray-tcp-doubled-buckets branch from 396cc20 to 78cf3e9 Compare

March 31, 2026 14:56

mrForza requested a review from Serpentian

April 1, 2026 10:14

mrForza assigned Serpentian and unassigned mrForza

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet