Skip to content

Commit 72b4ed2

Browse files
Add to allocation architecture guide (#125328)
How master and data nodes communicate about shard allocation
1 parent 26254e3 commit 72b4ed2

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

docs/internal/DistributedArchitectureGuide.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,18 @@ works in parallel with the storage engine.)
195195

196196
# Allocation
197197

198+
### Indexes and Shards
199+
200+
Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each
201+
primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed dynamically.
202+
203+
The allocation assignment status of each shard copy is tracked by its [ShardRoutingState][]. The `RoutingTable` and `RoutingNodes` objects
204+
are responsible for tracking the data nodes to which each shard in the cluster is allocated: see the [routing package javadoc][] for more
205+
details about these structures.
206+
207+
[routing package javadoc]: https://github.com/elastic/elasticsearch/blob/v9.0.0-beta1/server/src/main/java/org/elasticsearch/cluster/routing/package-info.java
208+
[ShardRoutingState]: https://github.com/elastic/elasticsearch/blob/4c9c82418ed98613edcd91e4d8f818eeec73ce92/server/src/main/java/org/elasticsearch/cluster/routing/ShardRoutingState.java#L12-L46
209+
198210
### Core Components
199211

200212
The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce
@@ -235,6 +247,22 @@ of shards, and an incentive to distribute shards within the same index across di
235247
`NodeAllocationStatsAndWeightsCalculator` classes for more details on the weight calculations that support the `DesiredBalanceComputer`
236248
decisions.
237249

250+
### Inter-Node Communicaton
251+
252+
The elected master node creates a shard allocation plan with the `DesiredBalanceShardsAllocator` and then selects incremental shard
253+
movements towards the target allocation plan with the `DesiredBalanceReconciler`. The results of the `DesiredBalanceReconciler` is an
254+
updated `RoutingTable`. The `RoutingTable` is part of the cluster state, so the master node updates the cluster state with the new
255+
(incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will
256+
observe any change in shard allocation related to itself and take action to achieve the new shard allocation by: initiating creation of a
257+
new empty shard; starting recovery (copying) of an existing shard from another data node; or removing a shard. When the data node finishes
258+
a shard change, a request is sent to the master node to update the shard as having finished recovery/removal in the cluster state. The
259+
cluster state is used by allocation as a fancy work queue: the master node conveys new work to the data nodes, which pick up the work and
260+
report back when done.
261+
262+
- See `DesiredBalanceShardsAllocator#submitReconcileTask` for the master node's cluster state update post-reconciliation.
263+
- See `IndicesClusterStateService#doApplyClusterState` for the data node hook to observe shard changes in the cluster state.
264+
- See `ShardStateAction#sendShardAction` for the data node request to the master node on completion of a shard state change.
265+
238266
# Autoscaling
239267

240268
The Autoscaling API in ES (Elasticsearch) uses cluster and node level statistics to provide a recommendation

0 commit comments

Comments
 (0)