Skip to content

replicaset: router's connection to replica can't be recreated by rs:callro #642

@mrForza

Description

@mrForza
  1. Environment:

    • Tarantool version: 3.7.0-entrypoint-17-g0ccd7fe770
    • Vshard version: 0.1.39
    • OS: linux, x86-64
  2. Bug description: If router's netbox connection to master was closed due to some circumstances (e.g. we manually closed it or the initial handshake failed due to retryable / non-retryable error), we must be able to recreate a connection by replicaset functions such as replicaset:call, replicaset:callrw, replicaset:callro and so one. However when we invoke replicaset:callro a connection to master can't be recreated.

  3. Reproducer:

    local function check_router_connections_to_replicas(router, rs_id, statuses)
        router:exec(function(rs_id, statuses)
            t.helpers.retrying({}, function()
                local rs_info = ivshard.router.info().replicasets[rs_id]
                for replica, status in pairs(statuses) do
                    t.assert_equals(rs_info[replica].status, status)
                end
            end)
        end, {rs_id, statuses})
    end
    
    g.test_conn_restores_to_master_during_replicaset_callro = function(g)
        local rs2_id = g.replica_2_a:replicaset_uuid()
        check_router_connections_to_replicas(g.router, rs2_id, {
            master = 'available', replica = 'available'})
        g.router:exec(function(rs_id)
            local rs = ivshard.router.internal.static_router.replicasets[rs_id]
            rs.master:detach_conn()
        end, {rs2_id})
        check_router_connections_to_replicas(g.router, rs2_id, {
            master = 'unreachable', replica = 'available'})
        g.router:exec(function(rs_id)
            local rs = ivshard.router.internal.static_router.replicasets[rs_id]
            local res, err = rs:callro('echo', {123}, {timeout = iwait_timeout})
            t.assert_not(err)
            t.assert(res)
        end, {rs2_id})
        -- The connection to master should be restored by replicaset:callro!
        check_router_connections_to_replicas(g.router, rs2_id, {
            master = 'available', replica = 'available'})
    end
  4. Actual result: A new router's netbox connection to master is not recreated. We see the old instance status - "unreachable"

    [001] not ok 5  router.test_conn_restores_to_master_during_replicaset_callro
    [001] #   ...t/Desktop/vshard/test/router-luatest/router_2_2_test.lua:1370: expected: "available"
    [001] #   actual: "unreachable"
    [001] #   diff:
    [001] #   -"available"
    [001] #   +"unreachable"
    [001] #   stack traceback:
    [001] #         ...t/Desktop/vshard/test/router-luatest/router_2_2_test.lua:1370: in function 'retrying'
    [001] #         ...t/Desktop/vshard/test/router-luatest/router_2_2_test.lua:1367: in function <...t/Desktop/vshard/test/router-luatest/ro
    uter_2_2_test.lua:1366>
    [001] #         ...t/Desktop/vshard/test/router-luatest/router_2_2_test.lua:1366: in function 'check_router_connections_to_replicas'
    [001] #         ...t/Desktop/vshard/test/router-luatest/router_2_2_test.lua:1393: in function 'router.test_conn_restores_to_master_during
    _replicaset_callro'
    [001] #   artifacts:
    [001] #         replica_2_a -> /tmp/t/001_router-luatest/artifacts/01EeS9KjuBHF
    
  5. Expected behavior: A new router's netbox connection to master is created and we see "available" status of master.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrouter

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions