Skip to content

Commit ed8856b

Browse files
Fix cluster slot migration flaky test (#2756)
The original test code only checks: The original test code only checks: 1. wait_for_cluster_size 4, which calls cluster_size_consistent for every node. Inside that function, for each node, cluster_size_consistent queries cluster_known_nodes, which is calculated as (unsigned long long)dictSize(server.cluster->nodes). However, when a new node is added to the cluster, it is first created in the HANDSHAKE state, and clusterAddNode adds it to the nodes hash table. Therefore, it is possible for the new node to still be in HANDSHAKE status (processed asynchronously) even though it appears that all nodes “know” there are 4 nodes in the cluster. 2. cluster_state for every node, but when a new node is added, server.cluster->state remains FAIL. Some handshake processes may not have completed yet, which likely causes the flakiness. To address this, added a --cluster check to ensure that the config state is consistent. Fixes #2693. Signed-off-by: Hanxi Zhang <[email protected]> Co-authored-by: Binbin <[email protected]>
1 parent e19ceb7 commit ed8856b

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

tests/unit/cluster/cli.tcl

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,14 @@ test {Migrate the last slot away from a node using valkey-cli} {
275275
# First we wait for new node to be recognized by entire cluster
276276
wait_for_cluster_size 4
277277

278+
# Cluster check just verifies the config state is self-consistent,
279+
# waiting for cluster_state to be okay is an independent check that all the
280+
# nodes actually believe each other are healthy, prevent cluster down error.
278281
wait_for_condition 1000 50 {
282+
[catch {exec src/valkey-cli --cluster check 127.0.0.1:[srv 0 port]}] == 0 &&
283+
[catch {exec src/valkey-cli --cluster check 127.0.0.1:[srv -1 port]}] == 0 &&
284+
[catch {exec src/valkey-cli --cluster check 127.0.0.1:[srv -2 port]}] == 0 &&
285+
[catch {exec src/valkey-cli --cluster check 127.0.0.1:[srv -3 port]}] == 0 &&
279286
[CI 0 cluster_state] eq {ok} &&
280287
[CI 1 cluster_state] eq {ok} &&
281288
[CI 2 cluster_state] eq {ok} &&
@@ -530,4 +537,4 @@ start_multiple_servers 3 [list overrides $base_conf] {
530537
}
531538
}
532539

533-
}
540+
}

0 commit comments

Comments
 (0)