@@ -232,40 +232,42 @@ works in parallel with the storage engine.)
232232### Core Components
233233
234234The ` DesiredBalanceShardsAllocator ` is what runs shard allocation decisions. It leverages the ` DesiredBalanceComputer ` to produce
235- ` DesiredBalance ` instances for the cluster based on the latest cluster changes (add/remove nodes, create/remove indices, load, etc). Then
235+ ` DesiredBalance ` instances for the cluster based on the latest cluster changes (add/remove nodes, create/remove indices, load, etc. ). Then
236236the ` DesiredBalanceReconciler ` is invoked to choose the next steps to take to move the cluster from the current shard allocation to the
237- latest computed ` DesiredBalance ` shard allocation. The ` Reconciler ` will apply changes to a copy of the ` RoutingNodes ` , which is then
238- published in a cluster state update that will reach the data nodes to start the individual shard creation/ recovery/move work.
239-
240- The ` Reconciler ` is throttled by cluster settings, like the max number of concurrent shard moves and recoveries per cluster and node: this
241- is why the ` Reconciler ` will make, and publish via cluster state updates, incremental changes to the cluster shard allocation. The
242- ` DesiredBalanceShardsAllocator ` is the endpoint for reroute requests, which may trigger immediate requests to the ` Reconciler ` , but
243- asynchronous requests to the ` DesiredBalanceComputer ` via the ` ContinuousComputation ` component. Cluster state changes that affect shard
244- balancing (for example index deletion) all call some reroute method interface that reaches the ` DesiredBalanceShardsAllocator ` to run
245- reconciliation and queue a request for the ` DesiredBalancerComputer ` , leading to desired balance computation and reconciliation actions.
246- Asynchronous completion of a new ` DesiredBalance ` will also invoke a reconciliation action, as will cluster state updates completing shard
247- moves/recoveries (unthrottling the next shard move/recovery).
248-
249- The ` ContinuousComputation ` maintains a queue of desired balance computation requests, each of which holds the latest cluster information at
250- the time of the request, and a thread that runs the ` DesiredBalanceComputer ` . The ContinuousComputation thread grabs the latest request,
251- with the freshest cluster information, feeds it into the ` DesiredBalanceComputer ` and publishes a ` DesiredBalance ` back to the
237+ latest computed ` DesiredBalance ` shard allocation. The ` DesiredBalanceReconciler ` will apply changes to a copy of the ` RoutingNodes ` , which
238+ is then published in a cluster state update that will reach the data nodes to start the individual shard recovery/deletion /move work.
239+
240+ The ` DesiredBalanceReconciler ` is throttled by cluster settings, like the max number of concurrent shard moves and recoveries per cluster
241+ and node: this is why the ` DesiredBalanceReconciler ` will make, and publish via cluster state updates, incremental changes to the cluster
242+ shard allocation. The ` DesiredBalanceShardsAllocator ` is the endpoint for reroute requests, which may trigger immediate requests to the
243+ ` DesiredBalanceReconciler ` , but asynchronous requests to the ` DesiredBalanceComputer ` via the ` ContinuousComputation ` component. Cluster
244+ state changes that affect shard balancing (for example index deletion) all call some reroute method interface that reaches the
245+ ` DesiredBalanceShardsAllocator ` to run reconciliation and queue a request for the ` DesiredBalancerComputer ` , leading to desired balance
246+ computation and reconciliation actions. Asynchronous completion of a new ` DesiredBalance ` will also invoke a reconciliation action, as will
247+ cluster state updates completing shard moves/recoveries (unthrottling the next shard move/recovery).
248+
249+ The ` ContinuousComputation ` saves the latest desired balance computation request, which holds the cluster information at the time of that
250+ request, and a thread that runs the ` DesiredBalanceComputer ` . The ` ContinuousComputation ` thread takes the latest request, with the
251+ associated cluster information, feeds it into the ` DesiredBalanceComputer ` and publishes a ` DesiredBalance ` back to the
252252` DesiredBalanceShardsAllocator ` to use for reconciliation actions. Sometimes the ` ContinuousComputation ` thread's desired balance
253- computation will be signalled to end early and get a ` DesiredBalance ` published more quickly, if newer cluster state changes are significant
254- or to get unassigned shards started as soon as possible.
253+ computation will be signalled to exit early and publish the initial ` DesiredBalance ` improvements it has made, when newer rebalancing
254+ requests (due to cluster state changes) have arrived, or in order to begin recovery of unassigned shards as quickly as possible.
255255
256256### Rebalancing Process
257257
258258There are different priorities in shard allocation, reflected in which moves the ` DesiredBalancerReconciler ` selects to do first given that
259259it can only move, recover, or remove a limited number of shards at once. The first priority is assigning unassigned shards, primaries being
260- more important than replicas. The second is to move shards that violate node resource limits or hard limits. The ` AllocationDeciders ` holds
261- a group of ` AllocationDecider ` types that place hard constraints on shard allocation. There is a decider that manages disk memory usage
262- thresholds that does not allow further shard assignment or requires shards to be moved off; or another that excludes a configurable list of
263- indices from certain nodes; or one that will not attempt to recover a shard on a certain node after so many failed retries. The third
264- priority is to rebalance shards to even out the relative weight of shards on each node: the intention is to avoid, or ease, future
265- hot-spotting on data nodes due to too many shards being placed on the same data node. Node shard weight is based on a sum of factors: disk
266- memory usage, projected shard write load, total number of shards, and an incentive to spread out shards within the same index. See the
267- ` WeightFunction ` and ` NodeAllocationStatsAndWeightsCalculator ` classes for more details on the weight calculations that support the
268- ` DesiredBalanceComputer ` decisions.
260+ more important than replicas. The second is to move shards that violate any rule (such as node resource limits) as defined by an
261+ ` AllocationDecider ` . The ` AllocationDeciders ` holds a group of ` AllocationDecider ` implementations that place hard constraints on shard
262+ allocation. There is a decider, ` DiskThresholdDecider ` , that manages disk memory usage thresholds, such that further shards may not be
263+ allowed assignment to a node, or shards may be required to move off because they grew to exceed the disk space; or another,
264+ ` FilterAllocationDecider ` , that excludes a configurable list of indices from certain nodes; or ` MaxRetryAllocationDecider ` that will not
265+ attempt to recover a shard on a certain node after so many failed retries. The third priority is to rebalance shards to even out the
266+ relative weight of shards on each node: the intention is to avoid, or ease, future hot-spotting on data nodes due to too many shards being
267+ placed on the same data node. Node shard weight is based on a sum of factors: disk memory usage, projected shard write load, total number
268+ of shards, and an incentive to distribute shards within the same index across different nodes. See the ` WeightFunction ` and
269+ ` NodeAllocationStatsAndWeightsCalculator ` classes for more details on the weight calculations that support the ` DesiredBalanceComputer `
270+ decisions.
269271
270272# Autoscaling
271273
0 commit comments