Skip to content

Commit 8c7ba0a

Browse files
authored
Merge pull request #49146 from windsonsea/noshut
Clean up cluster-administration/node-shutdown.md
2 parents 659d841 + 664a30e commit 8c7ba0a

File tree

1 file changed

+59
-50
lines changed

1 file changed

+59
-50
lines changed

content/en/docs/concepts/cluster-administration/node-shutdown.md

Lines changed: 59 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,27 @@ weight: 10
55
---
66

77
<!-- overview -->
8+
89
In a Kubernetes cluster, a {{< glossary_tooltip text="node" term_id="node" >}}
9-
can be shutdown in a planned graceful way or unexpectedly because of reasons such
10+
can be shut down in a planned graceful way or unexpectedly because of reasons such
1011
as a power outage or something else external. A node shutdown could lead to workload
1112
failure if the node is not drained before the shutdown. A node shutdown can be
1213
either **graceful** or **non-graceful**.
1314

1415
<!-- body -->
16+
1517
## Graceful node shutdown {#graceful-node-shutdown}
1618

1719
{{< feature-state feature_gate_name="GracefulNodeShutdown" >}}
1820

1921
The kubelet attempts to detect node system shutdown and terminates pods running on the node.
2022

21-
Kubelet ensures that pods follow the normal
23+
kubelet ensures that pods follow the normal
2224
[pod termination process](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
2325
during the node shutdown. During node shutdown, the kubelet does not accept new
2426
Pods (even if those Pods are already bound to the node).
2527

26-
The Graceful node shutdown feature depends on systemd since it takes advantage of
28+
The graceful node shutdown feature depends on systemd since it takes advantage of
2729
[systemd inhibitor locks](https://www.freedesktop.org/wiki/Software/systemd/inhibit/) to
2830
delay the node shutdown with a given duration.
2931

@@ -32,12 +34,12 @@ Graceful node shutdown is controlled with the `GracefulNodeShutdown`
3234
enabled by default in 1.21.
3335

3436
Note that by default, both configuration options described below,
35-
`shutdownGracePeriod` and `shutdownGracePeriodCriticalPods` are set to zero,
37+
`shutdownGracePeriod` and `shutdownGracePeriodCriticalPods`, are set to zero,
3638
thus not activating the graceful node shutdown functionality.
37-
To activate the feature, the two kubelet config settings should be configured appropriately and
39+
To activate the feature, both options should be configured appropriately and
3840
set to non-zero values.
3941

40-
Once systemd detects or notifies node shutdown, the kubelet sets a `NotReady` condition on
42+
Once systemd detects or is notified of a node shutdown, the kubelet sets a `NotReady` condition on
4143
the Node, with the `reason` set to `"node is shutting down"`. The kube-scheduler honors this condition
4244
and does not schedule any Pods onto the affected node; other third-party schedulers are
4345
expected to follow the same logic. This means that new Pods won't be scheduled onto that node
@@ -48,26 +50,29 @@ node shutdown has been detected, so that even Pods with a
4850
{{< glossary_tooltip text="toleration" term_id="toleration" >}} for
4951
`node.kubernetes.io/not-ready:NoSchedule` do not start there.
5052

51-
At the same time when kubelet is setting that condition on its Node via the API,
53+
When kubelet is setting that condition on its Node via the API,
5254
the kubelet also begins terminating any Pods that are running locally.
5355

5456
During a graceful shutdown, kubelet terminates pods in two phases:
5557

5658
1. Terminate regular pods running on the node.
57-
2. Terminate [critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)
59+
1. Terminate [critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)
5860
running on the node.
5961

60-
Graceful node shutdown feature is configured with two
62+
The graceful node shutdown feature is configured with two
6163
[`KubeletConfiguration`](/docs/tasks/administer-cluster/kubelet-config-file/) options:
6264

63-
* `shutdownGracePeriod`:
64-
* Specifies the total duration that the node should delay the shutdown by. This is the total
65-
grace period for pod termination for both regular and
66-
[critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical).
67-
* `shutdownGracePeriodCriticalPods`:
68-
* Specifies the duration used to terminate
69-
[critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)
70-
during a node shutdown. This value should be less than `shutdownGracePeriod`.
65+
- `shutdownGracePeriod`:
66+
67+
Specifies the total duration that the node should delay the shutdown by. This is the total
68+
grace period for pod termination for both regular and
69+
[critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical).
70+
71+
- `shutdownGracePeriodCriticalPods`:
72+
73+
Specifies the duration used to terminate
74+
[critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)
75+
during a node shutdown. This value should be less than `shutdownGracePeriod`.
7176

7277
{{< note >}}
7378

@@ -122,22 +127,22 @@ Assuming the following custom pod
122127
[priority classes](/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass)
123128
in a cluster,
124129

125-
|Pod priority class name|Pod priority class value|
126-
|-------------------------|------------------------|
127-
|`custom-class-a` | 100000 |
128-
|`custom-class-b` | 10000 |
129-
|`custom-class-c` | 1000 |
130-
|`regular/unset` | 0 |
130+
| Pod priority class name | Pod priority class value |
131+
| ----------------------- | ------------------------ |
132+
| `custom-class-a` | 100000 |
133+
| `custom-class-b` | 10000 |
134+
| `custom-class-c` | 1000 |
135+
| `regular/unset` | 0 |
131136

132137
Within the [kubelet configuration](/docs/reference/config-api/kubelet-config.v1beta1/)
133138
the settings for `shutdownGracePeriodByPodPriority` could look like:
134139

135-
|Pod priority class value|Shutdown period|
136-
|------------------------|---------------|
137-
| 100000 |10 seconds |
138-
| 10000 |180 seconds |
139-
| 1000 |120 seconds |
140-
| 0 |60 seconds |
140+
| Pod priority class value | Shutdown period |
141+
| ------------------------ | --------------- |
142+
| 100000 | 10 seconds |
143+
| 10000 | 180 seconds |
144+
| 1000 | 120 seconds |
145+
| 0 | 60 seconds |
141146

142147
The corresponding kubelet config YAML configuration would be:
143148

@@ -154,18 +159,18 @@ shutdownGracePeriodByPodPriority:
154159
```
155160
156161
The above table implies that any pod with `priority` value >= 100000 will get
157-
just 10 seconds to stop, any pod with value >= 10000 and < 100000 will get 180
158-
seconds to stop, any pod with value >= 1000 and < 10000 will get 120 seconds to stop.
159-
Finally, all other pods will get 60 seconds to stop.
162+
just 10 seconds to shut down, any pod with value >= 10000 and < 100000 will get 180
163+
seconds to shut down, any pod with value >= 1000 and < 10000 will get 120 seconds to shut down.
164+
Finally, all other pods will get 60 seconds to shut down.
160165

161166
One doesn't have to specify values corresponding to all of the classes. For
162167
example, you could instead use these settings:
163168

164-
|Pod priority class value|Shutdown period|
165-
|------------------------|---------------|
166-
| 100000 |300 seconds |
167-
| 1000 |120 seconds |
168-
| 0 |60 seconds |
169+
| Pod priority class value | Shutdown period |
170+
| ------------------------ | --------------- |
171+
| 100000 | 300 seconds |
172+
| 1000 | 120 seconds |
173+
| 0 | 60 seconds |
169174

170175
In the above case, the pods with `custom-class-b` will go into the same bucket
171176
as `custom-class-c` for shutdown.
@@ -225,14 +230,16 @@ on a different node.
225230
During a non-graceful shutdown, Pods are terminated in the two phases:
226231

227232
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
228-
2. Immediately perform detach volume operation for such pods.
233+
1. Immediately perform detach volume operation for such pods.
229234

230235
{{< note >}}
236+
231237
- Before adding the taint `node.kubernetes.io/out-of-service`, it should be verified
232238
that the node is already in shutdown or power off state (not in the middle of restarting).
233239
- The user is required to manually remove the out-of-service taint after the pods are
234240
moved to a new node and the user has checked that the shutdown node has been
235241
recovered since the user was the one who originally added the taint.
242+
236243
{{< /note >}}
237244

238245
### Forced storage detach on timeout {#storage-force-detach-on-timeout}
@@ -256,39 +263,41 @@ its associated
256263
[VolumeAttachment](/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/)
257264
deleted.
258265

259-
After this setting has been applied, unhealthy pods still attached to a volumes must be recovered
266+
After this setting has been applied, unhealthy pods still attached to volumes must be recovered
260267
via the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.
261268

262269
{{< note >}}
270+
263271
- Caution must be taken while using the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure.
264272
- Deviation from the steps documented above can result in data corruption.
265-
{{< /note >}}
266273

274+
{{< /note >}}
267275

268276
## Windows Graceful node shutdown {#windows-graceful-node-shutdown}
269277

270278
{{< feature-state feature_gate_name="WindowsGracefulNodeShutdown" >}}
271279

272-
The Windows graceful node shutdown feature depends on kubelet running as a Windows service,
273-
it will then have a registered [service control handler](https://learn.microsoft.com/en-us/windows/win32/services/service-control-handler-function)
274-
to delay the presshutdown event with a given duration.
280+
The Windows graceful node shutdown feature depends on kubelet running as a Windows service,
281+
it will then have a registered [service control handler](https://learn.microsoft.com/en-us/windows/win32/services/service-control-handler-function)
282+
to delay the preshutdown event with a given duration.
275283

276-
Windows graceful node shutdown is controlled with the `WindowsGracefulNodeShutdown`
277-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
284+
Windows graceful node shutdown is controlled with the `WindowsGracefulNodeShutdown`
285+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
278286
which is introduced in 1.32 as an alpha feature.
279287

280288
Windows graceful node shutdown can not be cancelled.
281289

282-
If Kubelet is not running as a Windows service, it will not be able to set and monitor
290+
If kubelet is not running as a Windows service, it will not be able to set and monitor
283291
the [Preshutdown](https://learn.microsoft.com/en-us/windows/win32/api/winsvc/ns-winsvc-service_preshutdown_info) event,
284292
the node will have to go through the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.
285293

286-
In the case where the Windows graceful node shutdown feature is enabled, but the kubelet is not
287-
running as a Windows service, the kubelet will continue running instead of failing. However,
294+
In the case where the Windows graceful node shutdown feature is enabled, but the kubelet is not
295+
running as a Windows service, the kubelet will continue running instead of failing. However,
288296
it will log an error indicating that it needs to be run as a Windows service.
289297

290298
## {{% heading "whatsnext" %}}
291299

292300
Learn more about the following:
293-
* Blog: [Non-Graceful Node Shutdown](/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/).
294-
* Cluster Architecture: [Nodes](/docs/concepts/architecture/nodes/).
301+
302+
- Blog: [Non-Graceful Node Shutdown](/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/).
303+
- Cluster Architecture: [Nodes](/docs/concepts/architecture/nodes/).

0 commit comments

Comments
 (0)