@@ -396,7 +396,8 @@ The kubelet attempts to detect node system shutdown and terminates pods running
396
396
397
397
Kubelet ensures that pods follow the normal
398
398
[ pod termination process] ( /docs/concepts/workloads/pods/pod-lifecycle/#pod-termination )
399
- during the node shutdown.
399
+ during the node shutdown. During node shutdown, the kubelet does not accept new
400
+ Pods (even if those Pods are already bound to the node).
400
401
401
402
The Graceful node shutdown feature depends on systemd since it takes advantage of
402
403
[ systemd inhibitor locks] ( https://www.freedesktop.org/wiki/Software/systemd/inhibit/ ) to
@@ -412,6 +413,20 @@ thus not activating the graceful node shutdown functionality.
412
413
To activate the feature, the two kubelet config settings should be configured appropriately and
413
414
set to non-zero values.
414
415
416
+ Once systemd detects or notifies node shutdown, the kubelet sets a ` NotReady ` condition on
417
+ the Node, with the ` reason ` set to ` "node is shutting down" ` . The kube-scheduler honors this condition
418
+ and does not schedule any Pods onto the affected node; other third-party schedulers are
419
+ expected to follow the same logic. This means that new Pods won't be scheduled onto that node
420
+ and therefore none will start.
421
+
422
+ The kubelet ** also** rejects Pods during the ` PodAdmission ` phase if an ongoing
423
+ node shutdown has been detected, so that even Pods with a
424
+ {{< glossary_tooltip text="toleration" term_id="toleration" >}} for
425
+ ` node.kubernetes.io/not-ready:NoSchedule ` do not start there.
426
+
427
+ At the same time when kubelet is setting that condition on its Node via the API, the kubelet also begins
428
+ terminating any Pods that are running locally.
429
+
415
430
During a graceful shutdown, kubelet terminates pods in two phases:
416
431
417
432
1 . Terminate regular pods running on the node.
@@ -430,6 +445,16 @@ Graceful node shutdown feature is configured with two
430
445
[ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
431
446
during a node shutdown. This value should be less than ` shutdownGracePeriod ` .
432
447
448
+ {{< note >}}
449
+
450
+ There are cases when Node termination was cancelled by the system (or perhaps manually
451
+ by an administrator). In either of those situations the
452
+ Node will return to the ` Ready ` state. However Pods which already started the process
453
+ of termination
454
+ will not be restored by kubelet and will need to be re-scheduled.
455
+
456
+ {{< /note >}}
457
+
433
458
For example, if ` shutdownGracePeriod=30s ` , and
434
459
` shutdownGracePeriodCriticalPods=10s ` , kubelet will delay the node shutdown by
435
460
30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved
0 commit comments