Skip to content

Commit 2578aa2

Browse files
authored
Clarify replica shard allocation step in rolling restart procedure
Updates the restart documentation to specify that setting `cluster.routing.allocation.enable` to "primaries" disables **replica** shard allocation, not all shard allocation. The phrase “disable shard allocation” seems to be a bit of a colloquial shorthand for developers and experienced users. Those familiar with Elasticsearch may implicitly understand it refers to replica shards during restarts. This PR makes that behavior explicit for newer users and reducing the risk of misconfiguration.
1 parent 0483056 commit 2578aa2

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

deploy-manage/maintenance/start-stop-services/full-cluster-restart-rolling-restart-procedures.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ Nodes exceeding the low watermark threshold will be slow to restart. Reduce the
1717

1818
## Full-cluster restart [restart-cluster-full]
1919

20-
1. **Disable shard allocation.**
21-
When you shut down a data node, the allocation process waits for `index.unassigned.node_left.delayed_timeout` (by default, one minute) before starting to replicate the shards on that node to other nodes in the cluster, which can involve a lot of I/O. Since the node is shortly going to be restarted, this I/O is unnecessary. You can avoid racing the clock by [disabling allocation](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-routing-allocation-enable) of replicas before shutting down [data nodes](../../distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role):
20+
1. **Disable replica shard allocation.**
21+
When you shut down a data node, the allocation process waits for `index.unassigned.node_left.delayed_timeout` (by default, one minute) before starting to replicate the shards on that node to other nodes in the cluster, which can involve a lot of I/O. Since the node is shortly going to be restarted, this I/O is unnecessary. You can avoid racing the clock by [disabling allocation of replicas](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-routing-allocation-enable) before shutting down [data nodes](../../distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role):
2222

2323
```console
2424
PUT _cluster/settings
@@ -91,8 +91,8 @@ Nodes exceeding the low watermark threshold will be slow to restart. Reduce the
9191
When a node joins the cluster, it begins to recover any primary shards that are stored locally. The [`_cat/health`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-health) API initially reports a `status` of `red`, indicating that not all primary shards have been allocated.
9292
Once a node recovers its local shards, the cluster `status` switches to `yellow`, indicating that all primary shards have been recovered, but not all replica shards are allocated. This is to be expected because you have not yet re-enabled allocation. Delaying the allocation of replicas until all nodes are `yellow` allows the master to allocate replicas to nodes that already have local shard copies.
9393

94-
8. **Re-enable allocation.**
95-
When all nodes have joined the cluster and recovered their primary shards, re-enable allocation by restoring `cluster.routing.allocation.enable` to its default:
94+
8. **Re-enable replica shard allocation.**
95+
When all nodes have joined the cluster and recovered their primary shards, re-enable replica allocation by restoring `cluster.routing.allocation.enable` to its default:
9696

9797
```console
9898
PUT _cluster/settings
@@ -103,7 +103,7 @@ Nodes exceeding the low watermark threshold will be slow to restart. Reduce the
103103
}
104104
```
105105

106-
Once allocation is re-enabled, the cluster starts allocating replica shards to the data nodes. At this point it is safe to resume indexing and searching, but your cluster will recover more quickly if you can wait until all primary and replica shards have been successfully allocated and the status of all nodes is `green`.
106+
Once replica allocation is re-enabled, the cluster starts allocating replica shards to the data nodes. At this point it is safe to resume indexing and searching, but your cluster will recover more quickly if you can wait until all primary and replica shards have been successfully allocated and the status of all nodes is `green`.
107107
You can monitor progress with the [`_cat/health`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-health) and [`_cat/recovery`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-recovery) APIs:
108108

109109
```console
@@ -123,8 +123,8 @@ Nodes exceeding the low watermark threshold will be slow to restart. Reduce the
123123

124124
## Rolling restart [restart-cluster-rolling]
125125

126-
1. **Disable shard allocation.**
127-
When you shut down a data node, the allocation process waits for `index.unassigned.node_left.delayed_timeout` (by default, one minute) before starting to replicate the shards on that node to other nodes in the cluster, which can involve a lot of I/O. Since the node is shortly going to be restarted, this I/O is unnecessary. You can avoid racing the clock by [disabling allocation](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-routing-allocation-enable) of replicas before shutting down [data nodes](../../distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role):
126+
1. **Disable replica shard allocation.**
127+
When you shut down a data node, the allocation process waits for `index.unassigned.node_left.delayed_timeout` (by default, one minute) before starting to replicate the shards on that node to other nodes in the cluster, which can involve a lot of I/O. Since the node is shortly going to be restarted, this I/O is unnecessary. You can avoid racing the clock by [disabling allocation of replicas](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-routing-allocation-enable) before shutting down [data nodes](../../distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role):
128128

129129
```console
130130
PUT _cluster/settings
@@ -187,8 +187,8 @@ Nodes exceeding the low watermark threshold will be slow to restart. Reduce the
187187
GET _cat/nodes
188188
```
189189

190-
7. **Reenable shard allocation.**
191-
For data nodes, once the node has joined the cluster, remove the `cluster.routing.allocation.enable` setting to enable shard allocation and start using the node:
190+
7. **Re-enable replica shard allocation.**
191+
For data nodes, once the node has joined the cluster, remove the `cluster.routing.allocation.enable` setting to enable replica shard allocation and start using the node:
192192

193193
```console
194194
PUT _cluster/settings

0 commit comments

Comments
 (0)