Skip to content

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Oct 7, 2025

In DesiredBalancerComputer, shard movements should be simulated by starting any shards that are initializatng. Previously the explicitly moved shards are not covered. This PR fixes it.

Resolves: ES-12943

In DesiredBalancerComputer, shard movements should be simulated by
starting any shards that are initializatng. Previously the explicitly
moved shards are not covered. This PR fixes it.

Resolves: ES-12943
@ywangd ywangd added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v9.3.0 labels Oct 7, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @ywangd, I've created a changelog YAML for you.

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

final ShardRouting[] initializingShards = routingNodes.node(
routingAllocation.nodes().resolveNode(command.toNode()).getId()
).initializing();
assert initializingShards.length == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert no initializing shards after the while loop in general? That helps understand this assertion without reading the prior code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is asserted inside BalancedShardAllocator to ensure each simulation call starts without any initialising shad.

routingAllocation.nodes().resolveNode(command.toNode()).getId()
).initializing();
assert initializingShards.length == 1
&& routingAllocation.nodes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let us split into two assertions, one for only 1 initializing shard and one for the right node holding the initializing shard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep see b9dc835

@Override
public Decision canAllocate(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) {
// Move command works every decision except NO
return randomFrom(Decision.YES, Decision.THROTTLE, Decision.THROTTLE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean not-preferred rather than 2xthrottle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have been not-preferred, updated in 5fbd0f9

var desiredBalanceComputer = createDesiredBalanceComputer(new ShardsAllocator() {
@Override
public void allocate(RoutingAllocation allocation) {
assertThat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a comment that this runs right after the moves are applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps this is actually better as an assertion? Which I sort of proposed above. Happy to keep it though for the test to have verification, but we'll probably hit the assertion first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment in fcc6449

In BalancedShardsAllocator, we do assert that the initial routingAlloction has no initializing shard. Is it similar to what you have in mind? It's not necessary that allocate runs "right" after the move commands, e.g. we can have additional code after the move commands. The important thing is that all moving shards are started once computer is ready to call allocate.

@ywangd ywangd requested a review from DiannaHohensee October 8, 2025 01:59
Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ywangd
Copy link
Member Author

ywangd commented Oct 8, 2025

@elasticmachine update branch

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 8, 2025
@elasticsearchmachine elasticsearchmachine merged commit c9dc3e8 into elastic:main Oct 8, 2025
34 checks passed
@ywangd ywangd deleted the ES-12943-simulate-moved-shards branch October 8, 2025 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants