Start the allocation architecture guide section #121940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

DiannaHohensee merged 3 commits into elastic:main from DiannaHohensee:2025/02/06/ES-10423-allocation-arch-guide

Feb 18, 2025

Contributor

DiannaHohensee commented Feb 6, 2025 •

edited

Loading

This is a high-level overview of the main rebalancing components and
how they interact to move shards around the cluster, and decide where
shards should go.

Relates ES-10423

I haven't sold myself on the section titles, if anything better comes to mind.

@DaveCTurner not urgent to review this. Assigning to you because I'm not aware of anyone else already familiar with how this all works.


          Start the allocation architecture guide section

a6bc4d9

DiannaHohensee added >non-issue :Distributed Coordination/Allocation Team:Distributed Coordination labels

DiannaHohensee self-assigned this

elasticsearchmachine added the v9.1.0 label

Collaborator

elasticsearchmachine commented Feb 6, 2025

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DiannaHohensee requested a review from DaveCTurner

February 6, 2025 19:23

DaveCTurner reviewed

View reviewed changes

Contributor

DaveCTurner left a comment

Looks good, just a few small comments.

docs/internal/DistributedArchitectureGuide.md Outdated

    
              The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce

              `DesiredBalance` instances for the cluster based on the latest cluster changes (add/remove nodes, create/remove indices, load, etc). Then

              the `DesiredBalanceReconciler` is invoked to choose the next steps to take to move the cluster from the current shard allocation to the

              latest computed `DesiredBalance` shard allocation. The `Reconciler` will apply changes to a copy of the `RoutingNodes`, which is then

Contributor

DaveCTurner Feb 18, 2025

nit: DesiredBalanceReconciler not just Reconciler (here and below)

Contributor Author

DiannaHohensee Feb 18, 2025

Done

docs/internal/DistributedArchitectureGuide.md Outdated

    
              Asynchronous completion of a new `DesiredBalance` will also invoke a reconciliation action, as will cluster state updates completing shard

              moves/recoveries (unthrottling the next shard move/recovery).

              The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at

Contributor

DaveCTurner Feb 18, 2025

Kind of a queue but really it's only tracking the latest request.

Contributor Author

DiannaHohensee Feb 18, 2025 •

edited

Loading

Fixed, thanks for catching that.

I guess the enqueued terminology got stuck in my head :)

docs/internal/DistributedArchitectureGuide.md Outdated

    
              moves/recoveries (unthrottling the next shard move/recovery).

              The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at

              the time of the request, and a thread that runs the `DesiredBalanceComputer`. The ContinuousComputation thread grabs the latest request,

Contributor

DaveCTurner Feb 18, 2025

nit:

Suggested change

      
            the time of the request, and a thread that runs the `DesiredBalanceComputer`. The ContinuousComputation thread grabs the latest request,
          
            the time of the request, and a thread that runs the `DesiredBalanceComputer`. The `ContinuousComputation` thread grabs the latest request,

Contributor Author

DiannaHohensee Feb 18, 2025

Done

docs/internal/DistributedArchitectureGuide.md Outdated

    
              There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that

              it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being

              more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds

Contributor

DaveCTurner Feb 18, 2025

violate node resource limits or hard limits

Really, violating any rule as defined by an AllocationDecider.

Contributor Author

DiannaHohensee Feb 18, 2025

Rephrased 👍

docs/internal/DistributedArchitectureGuide.md Outdated

    
              There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that

              it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being

              more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds

              a group of `AllocationDecider` types that place hard constraints on shard allocation. There is a decider that manages disk memory usage

Contributor

DaveCTurner Feb 18, 2025

There is a decider

Maybe name these deciders here so folks can go and look them up?

Contributor Author

DiannaHohensee Feb 18, 2025

Done

DiannaHohensee added 2 commits

February 18, 2025 13:16


          review fixes and some self-review

2a951eb


          Merge branch 'main' into 2025/02/06/ES-10423-allocation-arch-guide

d534a34

DiannaHohensee commented

View reviewed changes

Contributor Author

DiannaHohensee left a comment

Addressed the comments in 2a951eb

docs/internal/DistributedArchitectureGuide.md Outdated

    
              The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce

              `DesiredBalance` instances for the cluster based on the latest cluster changes (add/remove nodes, create/remove indices, load, etc). Then

              the `DesiredBalanceReconciler` is invoked to choose the next steps to take to move the cluster from the current shard allocation to the

              latest computed `DesiredBalance` shard allocation. The `Reconciler` will apply changes to a copy of the `RoutingNodes`, which is then

Contributor Author

DiannaHohensee Feb 18, 2025

Done

docs/internal/DistributedArchitectureGuide.md Outdated

    
              Asynchronous completion of a new `DesiredBalance` will also invoke a reconciliation action, as will cluster state updates completing shard

              moves/recoveries (unthrottling the next shard move/recovery).

              The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at

Contributor Author

DiannaHohensee Feb 18, 2025 •

edited

Loading

Fixed, thanks for catching that.

I guess the enqueued terminology got stuck in my head :)

docs/internal/DistributedArchitectureGuide.md Outdated

    
              moves/recoveries (unthrottling the next shard move/recovery).

              The `ContinuousComputation` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at

              the time of the request, and a thread that runs the `DesiredBalanceComputer`. The ContinuousComputation thread grabs the latest request,

Contributor Author

DiannaHohensee Feb 18, 2025

Done

docs/internal/DistributedArchitectureGuide.md Outdated

    
              There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that

              it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being

              more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds

              a group of `AllocationDecider` types that place hard constraints on shard allocation. There is a decider that manages disk memory usage

Contributor Author

DiannaHohensee Feb 18, 2025

Done

docs/internal/DistributedArchitectureGuide.md Outdated

    
              There are different priorities in shard allocation, reflected in which moves the `DesiredBalancerReconciler` selects to do first given that

              it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being

              more important than replicas. The second is to move shards that violate node resource limits or hard limits. The `AllocationDeciders` holds

Contributor Author

DiannaHohensee Feb 18, 2025

Rephrased 👍

DiannaHohensee requested a review from DaveCTurner

February 18, 2025 18:19

DaveCTurner approved these changes

View reviewed changes

Contributor

DaveCTurner left a comment

LGTM

DiannaHohensee merged commit befc6a0 into elastic:main

3 of 7 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation >non-issue Team:Distributed Coordination v9.1.0