Skip to content

Commit d22f3b9

Browse files
Clarify pod scheduling during node graceful termination (#41061)
* clarify the pods scheduling during graceful termination: * Update content/en/docs/concepts/architecture/nodes.md Co-authored-by: Qiming Teng <[email protected]> --------- Co-authored-by: Qiming Teng <[email protected]>
1 parent b9c88e7 commit d22f3b9

File tree

1 file changed

+26
-1
lines changed
  • content/en/docs/concepts/architecture

1 file changed

+26
-1
lines changed

content/en/docs/concepts/architecture/nodes.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,8 @@ The kubelet attempts to detect node system shutdown and terminates pods running
396396

397397
Kubelet ensures that pods follow the normal
398398
[pod termination process](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
399-
during the node shutdown.
399+
during the node shutdown. During node shutdown, the kubelet does not accept new
400+
Pods (even if those Pods are already bound to the node).
400401

401402
The Graceful node shutdown feature depends on systemd since it takes advantage of
402403
[systemd inhibitor locks](https://www.freedesktop.org/wiki/Software/systemd/inhibit/) to
@@ -412,6 +413,20 @@ thus not activating the graceful node shutdown functionality.
412413
To activate the feature, the two kubelet config settings should be configured appropriately and
413414
set to non-zero values.
414415

416+
Once systemd detects or notifies node shutdown, the kubelet sets a `NotReady` condition on
417+
the Node, with the `reason` set to `"node is shutting down"`. The kube-scheduler honors this condition
418+
and does not schedule any Pods onto the affected node; other third-party schedulers are
419+
expected to follow the same logic. This means that new Pods won't be scheduled onto that node
420+
and therefore none will start.
421+
422+
The kubelet **also** rejects Pods during the `PodAdmission` phase if an ongoing
423+
node shutdown has been detected, so that even Pods with a
424+
{{< glossary_tooltip text="toleration" term_id="toleration" >}} for
425+
`node.kubernetes.io/not-ready:NoSchedule` do not start there.
426+
427+
At the same time when kubelet is setting that condition on its Node via the API, the kubelet also begins
428+
terminating any Pods that are running locally.
429+
415430
During a graceful shutdown, kubelet terminates pods in two phases:
416431

417432
1. Terminate regular pods running on the node.
@@ -430,6 +445,16 @@ Graceful node shutdown feature is configured with two
430445
[critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical)
431446
during a node shutdown. This value should be less than `shutdownGracePeriod`.
432447

448+
{{< note >}}
449+
450+
There are cases when Node termination was cancelled by the system (or perhaps manually
451+
by an administrator). In either of those situations the
452+
Node will return to the `Ready` state. However Pods which already started the process
453+
of termination
454+
will not be restored by kubelet and will need to be re-scheduled.
455+
456+
{{< /note >}}
457+
433458
For example, if `shutdownGracePeriod=30s`, and
434459
`shutdownGracePeriodCriticalPods=10s`, kubelet will delay the node shutdown by
435460
30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved

0 commit comments

Comments
 (0)