Skip to content

Conversation

@Bouncheck
Copy link

@Bouncheck Bouncheck commented Feb 28, 2025

RemovedNodeIt#should_signal_and_destroy_pool_when_node_gets_removed test starts 4 node cluster. Probably because of nodes starting in parallel it sometimes leads to the situation where two nodes own the same token. This throws the test off. Not only the driver sees one node as invalid, but also the assertion that expects 4 different token ranges fails, because there are only 3.

To remediate that this change modifies the test to start the setup with 1 node and then sequentially add remaining 3 nodes with their initial_token explicitly set.

Fixes #393

@Bouncheck Bouncheck self-assigned this Feb 28, 2025
@Bouncheck Bouncheck requested review from dkropachev and removed request for dkropachev February 28, 2025 17:55
@Bouncheck
Copy link
Author

Test failed on CI unexpectedly when run with other tests, converting to draft.

@Bouncheck Bouncheck marked this pull request as draft February 28, 2025 18:44
@roydahan
Copy link
Collaborator

roydahan commented Mar 3, 2025

Seems like a workaround, but let's first vaerify if we need that workaround or is it a real issue on the server side / ccm.

`RemovedNodeIt#should_signal_and_destroy_pool_when_node_gets_removed` test
starts 4 node cluster. Probably because of nodes starting in parallel
it sometimes leads to the situation where two nodes own the same token. This
throws the test off. Not only the driver sees one node as invalid, but also
the assertion that expects 4 different token ranges fails,
because there are only 3.

To remediate that this change modifies the test to start the setup with 1 node
and then sequentially add remaining 3 nodes with their `initial_token`
explicitly set.

Due to issues with ccm choosing conflicting jmx_ports when starting nodes
sequentially they are also set explicitly.
@Bouncheck Bouncheck force-pushed the fix-RemovedNodeIT-should_signal__removed-flaky branch from 686e996 to 0df288b Compare March 17, 2025 17:53
@Bouncheck Bouncheck marked this pull request as ready for review March 17, 2025 18:32
@Bouncheck
Copy link
Author

Fixed (non-flaky) failures. It's ready for review in case it ends up being needed.

@Bouncheck Bouncheck requested a review from dkropachev March 17, 2025 18:34
@dkropachev
Copy link

Let's not guess, let's find where this issue comes from.
Do we run ccm start with --wait-other-notice ?

@dkropachev
Copy link

Proper fix at scylladb/scylla-ccm#643

@dkropachev dkropachev closed this Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4.x: RemovedNodeIT.should_signal_and_destroy_pool_when_node_gets_removed can fail

3 participants