Skip to content

Commit 47103b9

Browse files
authored
Merge pull request #38444 from SaumyaBhushan/issue#38426
resolved node concept mixes up graceful and non graceful node shutdown
2 parents ce78973 + 67bbfbc commit 47103b9

File tree

1 file changed

+44
-44
lines changed
  • content/en/docs/concepts/architecture

1 file changed

+44
-44
lines changed

content/en/docs/concepts/architecture/nodes.md

Lines changed: 44 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -454,50 +454,6 @@ Message: Pod was terminated in response to imminent node shutdown.
454454

455455
{{< /note >}}
456456

457-
## Non Graceful node shutdown {#non-graceful-node-shutdown}
458-
459-
{{< feature-state state="beta" for_k8s_version="v1.26" >}}
460-
461-
A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
462-
either because the command does not trigger the inhibitor locks mechanism used by
463-
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
464-
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
465-
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
466-
467-
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
468-
that are part of a StatefulSet will be stuck in terminating status on
469-
the shutdown node and cannot move to a new running node. This is because kubelet on
470-
the shutdown node is not available to delete the pods so the StatefulSet cannot
471-
create a new pod with the same name. If there are volumes used by the pods, the
472-
VolumeAttachments will not be deleted from the original shutdown node so the volumes
473-
used by these pods cannot be attached to a new running node. As a result, the
474-
application running on the StatefulSet cannot function properly. If the original
475-
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
476-
created on a different running node. If the original shutdown node does not come up,
477-
these pods will be stuck in terminating status on the shutdown node forever.
478-
479-
To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
480-
or `NoSchedule` effect to a Node marking it out-of-service.
481-
If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
482-
is enabled on `kube-controller-manager`, and a Node is marked out-of-service with this taint, the
483-
pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
484-
detach operations for the pods terminating on the node will happen immediately. This allows the
485-
Pods on the out-of-service node to recover quickly on a different node.
486-
487-
During a non-graceful shutdown, Pods are terminated in the two phases:
488-
489-
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
490-
2. Immediately perform detach volume operation for such pods.
491-
492-
{{< note >}}
493-
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
494-
that the node is already in shutdown or power off state (not in the middle of
495-
restarting).
496-
- The user is required to manually remove the out-of-service taint after the pods are
497-
moved to a new node and the user has checked that the shutdown node has been
498-
recovered since the user was the one who originally added the taint.
499-
{{< /note >}}
500-
501457
### Pod Priority based graceful node shutdown {#pod-priority-graceful-node-shutdown}
502458

503459
{{< feature-state state="alpha" for_k8s_version="v1.23" >}}
@@ -596,6 +552,50 @@ the feature is Beta and is enabled by default.
596552
Metrics `graceful_shutdown_start_time_seconds` and `graceful_shutdown_end_time_seconds`
597553
are emitted under the kubelet subsystem to monitor node shutdowns.
598554

555+
## Non Graceful node shutdown {#non-graceful-node-shutdown}
556+
557+
{{< feature-state state="beta" for_k8s_version="v1.26" >}}
558+
559+
A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
560+
either because the command does not trigger the inhibitor locks mechanism used by
561+
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
562+
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
563+
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
564+
565+
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
566+
that are part of a StatefulSet will be stuck in terminating status on
567+
the shutdown node and cannot move to a new running node. This is because kubelet on
568+
the shutdown node is not available to delete the pods so the StatefulSet cannot
569+
create a new pod with the same name. If there are volumes used by the pods, the
570+
VolumeAttachments will not be deleted from the original shutdown node so the volumes
571+
used by these pods cannot be attached to a new running node. As a result, the
572+
application running on the StatefulSet cannot function properly. If the original
573+
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
574+
created on a different running node. If the original shutdown node does not come up,
575+
these pods will be stuck in terminating status on the shutdown node forever.
576+
577+
To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
578+
or `NoSchedule` effect to a Node marking it out-of-service.
579+
If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
580+
is enabled on `kube-controller-manager`, and a Node is marked out-of-service with this taint, the
581+
pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
582+
detach operations for the pods terminating on the node will happen immediately. This allows the
583+
Pods on the out-of-service node to recover quickly on a different node.
584+
585+
During a non-graceful shutdown, Pods are terminated in the two phases:
586+
587+
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
588+
2. Immediately perform detach volume operation for such pods.
589+
590+
{{< note >}}
591+
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
592+
that the node is already in shutdown or power off state (not in the middle of
593+
restarting).
594+
- The user is required to manually remove the out-of-service taint after the pods are
595+
moved to a new node and the user has checked that the shutdown node has been
596+
recovered since the user was the one who originally added the taint.
597+
{{< /note >}}
598+
599599
## Swap memory management {#swap-memory}
600600

601601
{{< feature-state state="alpha" for_k8s_version="v1.22" >}}

0 commit comments

Comments
 (0)