@@ -229,6 +229,23 @@ works in parallel with the storage engine.)
229229
230230# Allocation
231231
232+ ### Indexes and Shards
233+
234+ Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each
235+ primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed in runtime. Each
236+ shard copy (primary or replica) can be in one of four states:
237+ - UNASSIGNED: Not present on / assigned to any data node.
238+ - STARTED: Assigned to a specific data node and ready for indexing and search requests.
239+ - INITIALIZING: Running recovery on an assigned node. Fast for an empty new index shard. Possibly lengthy when restoring from a snapshot or
240+ moving from another node
241+ - RELOCATING - The state of a shard on a source node while the shard is being moved away to a target node: the target node is running
242+ recovery.
243+
244+ The ` RoutingTable ` and ` RoutingNodes ` classes are responsible for tracking to which data nodes each shard in the cluster is allocated: see
245+ the [ routing package javadoc] [ ] for more details about these structures.
246+
247+ [ routing package javadoc ] : https://github.com/elastic/elasticsearch/blob/v9.0.0-beta1/server/src/main/java/org/elasticsearch/cluster/routing/package-info.java
248+
232249### Core Components
233250
234251The ` DesiredBalanceShardsAllocator ` is what runs shard allocation decisions. It leverages the ` DesiredBalanceComputer ` to produce
@@ -269,6 +286,22 @@ of shards, and an incentive to distribute shards within the same index across di
269286` NodeAllocationStatsAndWeightsCalculator ` classes for more details on the weight calculations that support the ` DesiredBalanceComputer `
270287decisions.
271288
289+ ### Inter-Node Communicaton
290+
291+ The elected master node creates a shard allocation plan with the ` DesiredBalanceShardsAllocator ` and then selects incremental shard
292+ movements towards the target allocation plan with the ` DesiredBalanceReconciler ` . The results of the ` DesiredBalanceReconciler ` is an
293+ updated ` RoutingTable ` . The ` RoutingTable ` is part of the cluster state, so the master node updates the cluster state with the new
294+ (incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will
295+ observe any change in shard allocation related to that node and take action to achieve the new shard allocation by initiating creation of a
296+ new empty shard, starting recovery (copying) of an existing shard from another data node, or remove a shard. When the data node finishes
297+ a shard change, a request is sent to the master node to update the shard as having finished recovery/removal in the cluster state. The
298+ cluster state is used by allocation as a fancy work queue: the master node conveys new work to the data nodes, which pick up the work and
299+ report back when done.
300+
301+ See ` DesiredBalanceShardsAllocator#submitReconcileTask ` for the master node's cluster state update post-reconciliation.
302+ See ` IndicesClusterStateService#doApplyClusterState ` for the data node hook to observe shard changes in the cluster state.
303+ See ` ShardStateAction#sendShardAction ` for the data node request to the master node on completion of a shard state change.
304+
272305# Autoscaling
273306
274307The Autoscaling API in ES (Elasticsearch) uses cluster and node level statistics to provide a recommendation
0 commit comments