Skip to content

Conversation

@DiannaHohensee
Copy link
Contributor

@DiannaHohensee DiannaHohensee commented Feb 6, 2025

This is a high-level overview of the main rebalancing components and
how they interact to move shards around the cluster, and decide where
shards should go.

Relates ES-10423


I haven't sold myself on the section titles, if anything better comes to mind.

@DaveCTurner not urgent to review this. Assigning to you because I'm not aware of anyone else already familiar with how this all works.

@DiannaHohensee DiannaHohensee added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team labels Feb 6, 2025
@DiannaHohensee DiannaHohensee self-assigned this Feb 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few small comments.

The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce
`DesiredBalance` instances for the cluster based on the latest cluster changes (add/remove nodes, create/remove indices, load, etc). Then
the `DesiredBalanceReconciler` is invoked to choose the next steps to take to move the cluster from the current shard allocation to the
latest computed `DesiredBalance` shard allocation. The `Reconciler` will apply changes to a copy of the `RoutingNodes`, which is then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DesiredBalanceReconciler not just Reconciler (here and below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Asynchronous completion of a new `DesiredBalance` will also invoke a reconciliation action, as will cluster state updates completing shard
moves/recoveries (unthrottling the next shard move/recovery).

The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a queue but really it's only tracking the latest request.

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for catching that.

I guess the enqueued terminology got stuck in my head :)

moves/recoveries (unthrottling the next shard move/recovery).

The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at
the time of the request, and a thread that runs the `DesiredBalanceComputer`. The ContinuousComputation thread grabs the latest request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
the time of the request, and a thread that runs the `DesiredBalanceComputer`. The ContinuousComputation thread grabs the latest request,
the time of the request, and a thread that runs the `DesiredBalanceComputer`. The `ContinuousComputation` thread grabs the latest request,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that
it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being
more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

violate node resource limits or hard limits

Really, violating any rule as defined by an AllocationDecider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased 👍

There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that
it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being
more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds
a group of `AllocationDecider` types that place hard constraints on shard allocation. There is a decider that manages disk memory usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a decider

Maybe name these deciders here so folks can go and look them up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comments in 2a951eb

The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce
`DesiredBalance` instances for the cluster based on the latest cluster changes (add/remove nodes, create/remove indices, load, etc). Then
the `DesiredBalanceReconciler` is invoked to choose the next steps to take to move the cluster from the current shard allocation to the
latest computed `DesiredBalance` shard allocation. The `Reconciler` will apply changes to a copy of the `RoutingNodes`, which is then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Asynchronous completion of a new `DesiredBalance` will also invoke a reconciliation action, as will cluster state updates completing shard
moves/recoveries (unthrottling the next shard move/recovery).

The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at
Copy link
Contributor Author

@DiannaHohensee DiannaHohensee Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for catching that.

I guess the enqueued terminology got stuck in my head :)

moves/recoveries (unthrottling the next shard move/recovery).

The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at
the time of the request, and a thread that runs the `DesiredBalanceComputer`. The ContinuousComputation thread grabs the latest request,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that
it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being
more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds
a group of `AllocationDecider` types that place hard constraints on shard allocation. There is a decider that manages disk memory usage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that
it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being
more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased 👍

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DiannaHohensee DiannaHohensee merged commit befc6a0 into elastic:main Feb 18, 2025
3 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants