Skip to content

Commit a35dfa1

Browse files
Add to allocation architecture guide
1 parent e210ea8 commit a35dfa1

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

docs/internal/DistributedArchitectureGuide.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,23 @@ works in parallel with the storage engine.)
229229

230230
# Allocation
231231

232+
### Indexes and Shards
233+
234+
Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each
235+
primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed in runtime. Each
236+
shard copy (primary or replica) can be in one of four states:
237+
- UNASSIGNED: Not present on / assigned to any data node.
238+
- STARTED: Assigned to a specific data node and ready for indexing and search requests.
239+
- INITIALIZING: Running recovery on an assigned node. Fast for an empty new index shard. Possibly lengthy when restoring from a snapshot or
240+
moving from another node
241+
- RELOCATING - The state of a shard on a source node while the shard is being moved away to a target node: the target node is running
242+
recovery.
243+
244+
The `RoutingTable` and `RoutingNodes` classes are responsible for tracking to which data nodes each shard in the cluster is allocated: see
245+
the [routing package javadoc][] for more details about these structures.
246+
247+
[routing package javadoc]: https://github.com/elastic/elasticsearch/blob/v9.0.0-beta1/server/src/main/java/org/elasticsearch/cluster/routing/package-info.java
248+
232249
### Core Components
233250

234251
The `DesiredBalanceShardsAllocator` is what runs shard allocation decisions. It leverages the `DesiredBalanceComputer` to produce
@@ -269,6 +286,22 @@ of shards, and an incentive to distribute shards within the same index across di
269286
`NodeAllocationStatsAndWeightsCalculator` classes for more details on the weight calculations that support the `DesiredBalanceComputer`
270287
decisions.
271288

289+
### Inter-Node Communicaton
290+
291+
The elected master node creates a shard allocation plan with the `DesiredBalanceShardsAllocator` and then selects incremental shard
292+
movements towards the target allocation plan with the `DesiredBalanceReconciler`. The results of the `DesiredBalanceReconciler` is an
293+
updated `RoutingTable`. The `RoutingTable` is part of the cluster state, so the master node updates the cluster state with the new
294+
(incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will
295+
observe any change in shard allocation related to that node and take action to achieve the new shard allocation by initiating creation of a
296+
new empty shard, starting recovery (copying) of an existing shard from another data node, or remove a shard. When the data node finishes
297+
a shard change, a request is sent to the master node to update the shard as having finished recovery/removal in the cluster state. The
298+
cluster state is used by allocation as a fancy work queue: the master node conveys new work to the data nodes, which pick up the work and
299+
report back when done.
300+
301+
See `DesiredBalanceShardsAllocator#submitReconcileTask` for the master node's cluster state update post-reconciliation.
302+
See `IndicesClusterStateService#doApplyClusterState` for the data node hook to observe shard changes in the cluster state.
303+
See `ShardStateAction#sendShardAction` for the data node request to the master node on completion of a shard state change.
304+
272305
# Autoscaling
273306

274307
The Autoscaling API in ES (Elasticsearch) uses cluster and node level statistics to provide a recommendation

0 commit comments

Comments
 (0)