Skip to content

Commit 3342cbf

Browse files
authored
Merge pull request #41297 from tengqm/clarify-prestop-hook
Clarify prestop hook invocation condition
2 parents 6b1f65f + 06852a5 commit 3342cbf

File tree

1 file changed

+84
-76
lines changed

1 file changed

+84
-76
lines changed

content/en/docs/concepts/workloads/pods/pod-lifecycle.md

Lines changed: 84 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ If a {{< glossary_tooltip term_id="node" >}} dies, the Pods scheduled to that no
3838
are [scheduled for deletion](#pod-garbage-collection) after a timeout period.
3939

4040
Pods do not, by themselves, self-heal. If a Pod is scheduled to a
41-
{{< glossary_tooltip text="node" term_id="node" >}} that then fails, the Pod is deleted; likewise, a Pod won't
42-
survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a
41+
{{< glossary_tooltip text="node" term_id="node" >}} that then fails, the Pod is deleted; likewise,
42+
a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a
4343
higher-level abstraction, called a
4444
{{< glossary_tooltip term_id="controller" text="controller" >}}, that handles the work of
4545
managing the relatively disposable Pod instances.
@@ -57,8 +57,8 @@ created anew.
5757

5858
{{< figure src="/images/docs/pod.svg" title="Pod diagram" class="diagram-medium" >}}
5959

60-
*A multi-container Pod that contains a file puller and a
61-
web server that uses a persistent volume for shared storage between the containers.*
60+
A multi-container Pod that contains a file puller and a
61+
web server that uses a persistent volume for shared storage between the containers.
6262

6363
## Pod phase
6464

@@ -91,9 +91,9 @@ A Pod is granted a term to terminate gracefully, which defaults to 30 seconds.
9191
You can use the flag `--force` to [terminate a Pod by force](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced).
9292
{{< /note >}}
9393

94-
Since Kubernetes 1.27, the kubelet transitions deleted pods, except for
95-
[static pods](/docs/tasks/configure-pod-container/static-pod/) and
96-
[force-deleted pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced)
94+
Since Kubernetes 1.27, the kubelet transitions deleted Pods, except for
95+
[static Pods](/docs/tasks/configure-pod-container/static-pod/) and
96+
[force-deleted Pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced)
9797
without a finalizer, to a terminal phase (`Failed` or `Succeeded` depending on
9898
the exit statuses of the pod containers) before their deletion from the API server.
9999

@@ -219,13 +219,13 @@ status:
219219
...
220220
```
221221

222-
The Pod conditions you add must have names that meet the Kubernetes [label key format](/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set).
223-
222+
The Pod conditions you add must have names that meet the Kubernetes
223+
[label key format](/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set).
224224

225225
### Status for Pod readiness {#pod-readiness-status}
226226

227227
The `kubectl patch` command does not support patching object status.
228-
To set these `status.conditions` for the pod, applications and
228+
To set these `status.conditions` for the Pod, applications and
229229
{{< glossary_tooltip term_id="operator-pattern" text="operators">}} should use
230230
the `PATCH` action.
231231
You can use a [Kubernetes client library](/docs/reference/using-api/client-libraries/) to
@@ -247,20 +247,22 @@ When a Pod's containers are Ready but at least one custom condition is missing o
247247
After a Pod gets scheduled on a node, it needs to be admitted by the Kubelet and
248248
have any volumes mounted. Once these phases are complete, the Kubelet works with
249249
a container runtime (using {{< glossary_tooltip term_id="cri" >}}) to set up a
250-
runtime sandbox and configure networking for the Pod. If the
251-
`PodHasNetworkCondition` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled,
252-
Kubelet reports whether a pod has reached this initialization milestone through
250+
runtime sandbox and configure networking for the Pod. If the `PodHasNetworkCondition`
251+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled,
252+
Kubelet reports whether a Pod has reached this initialization milestone through
253253
the `PodHasNetwork` condition in the `status.conditions` field of a Pod.
254254

255255
The `PodHasNetwork` condition is set to `False` by the Kubelet when it detects a
256256
Pod does not have a runtime sandbox with networking configured. This occurs in
257257
the following scenarios:
258-
* Early in the lifecycle of the Pod, when the kubelet has not yet begun to set up a sandbox for the Pod using the container runtime.
259-
* Later in the lifecycle of the Pod, when the Pod sandbox has been destroyed due
260-
to either:
261-
* the node rebooting, without the Pod getting evicted
262-
* for container runtimes that use virtual machines for isolation, the Pod
263-
sandbox virtual machine rebooting, which then requires creating a new sandbox and fresh container network configuration.
258+
259+
- Early in the lifecycle of the Pod, when the kubelet has not yet begun to set up a sandbox for
260+
the Pod using the container runtime.
261+
- Later in the lifecycle of the Pod, when the Pod sandbox has been destroyed due to either:
262+
- the node rebooting, without the Pod getting evicted
263+
- for container runtimes that use virtual machines for isolation, the Pod
264+
sandbox virtual machine rebooting, which then requires creating a new sandbox and
265+
fresh container network configuration.
264266

265267
The `PodHasNetwork` condition is set to `True` by the kubelet after the
266268
successful completion of sandbox creation and network configuration for the Pod
@@ -277,16 +279,14 @@ condition to `True` before sandbox creation and network configuration starts.
277279

278280
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
279281

280-
See [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) for more information.
282+
See [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/)
283+
for more information.
281284

282285
## Container probes
283286

284-
A _probe_ is a diagnostic
285-
performed periodically by the
286-
[kubelet](/docs/reference/command-line-tools-reference/kubelet/)
287-
on a container. To perform a diagnostic,
288-
the kubelet either executes code within the container, or makes
289-
a network request.
287+
A _probe_ is a diagnostic performed periodically by the [kubelet](/docs/reference/command-line-tools-reference/kubelet/)
288+
on a container. To perform a diagnostic, the kubelet either executes code within the container,
289+
or makes a network request.
290290

291291
### Check mechanisms {#probe-check-methods}
292292

@@ -364,8 +364,6 @@ see [Configure Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod
364364

365365
#### When should you use a liveness probe?
366366

367-
{{< feature-state for_k8s_version="v1.0" state="stable" >}}
368-
369367
If the process in your container is able to crash on its own whenever it
370368
encounters an issue or becomes unhealthy, you do not necessarily need a liveness
371369
probe; the kubelet will automatically perform the correct action in accordance
@@ -376,8 +374,6 @@ specify a liveness probe, and specify a `restartPolicy` of Always or OnFailure.
376374

377375
#### When should you use a readiness probe?
378376

379-
{{< feature-state for_k8s_version="v1.0" state="stable" >}}
380-
381377
If you'd like to start sending traffic to a Pod only when a probe succeeds,
382378
specify a readiness probe. In this case, the readiness probe might be the same
383379
as the liveness probe, but the existence of the readiness probe in the spec means
@@ -410,8 +406,6 @@ to stop.
410406

411407
#### When should you use a startup probe?
412408

413-
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
414-
415409
Startup probes are useful for Pods that have containers that take a long time to
416410
come into service. Rather than set a long liveness interval, you can configure
417411
a separate configuration for probing the container as it starts up, allowing
@@ -440,68 +434,77 @@ shutdown.
440434
Typically, the container runtime sends a TERM signal to the main process in each
441435
container. Many container runtimes respect the `STOPSIGNAL` value defined in the container
442436
image and send this instead of TERM.
443-
Once the grace period has expired, the KILL signal is sent to any remaining
444-
processes, and the Pod is then deleted from the
445-
{{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}. If the kubelet or the
446-
container runtime's management service is restarted while waiting for processes to terminate, the
447-
cluster retries from the start including the full original grace period.
437+
Once the grace period has expired, the KILL signal is sent to any remaining processes, and the Pod
438+
is then deleted from the {{< glossary_tooltip text="API Server" term_id="kube-apiserver" >}}.
439+
If the kubelet or the container runtime's management service is restarted while waiting for
440+
processes to terminate, the cluster retries from the start including the full original grace period.
448441

449442
An example flow:
450443

451444
1. You use the `kubectl` tool to manually delete a specific Pod, with the default grace period
452445
(30 seconds).
446+
453447
1. The Pod in the API server is updated with the time beyond which the Pod is considered "dead"
454448
along with the grace period.
455-
If you use `kubectl describe` to check on the Pod you're deleting, that Pod shows up as
456-
"Terminating".
449+
If you use `kubectl describe` to check the Pod you're deleting, that Pod shows up as "Terminating".
457450
On the node where the Pod is running: as soon as the kubelet sees that a Pod has been marked
458451
as terminating (a graceful shutdown duration has been set), the kubelet begins the local Pod
459452
shutdown process.
453+
460454
1. If one of the Pod's containers has defined a `preStop`
461-
[hook](/docs/concepts/containers/container-lifecycle-hooks), the kubelet
462-
runs that hook inside of the container. If the `preStop` hook is still running after the
463-
grace period expires, the kubelet requests a small, one-off grace period extension of 2
464-
seconds.
455+
[hook](/docs/concepts/containers/container-lifecycle-hooks) and the `terminationGracePeriodSeconds`
456+
in the Pod spec is not set to 0, the kubelet runs that hook inside of the container.
457+
The default `terminationGracePeriodSeconds` setting is 30 seconds.
458+
459+
If the `preStop` hook is still running after the grace period expires, the kubelet requests
460+
a small, one-off grace period extension of 2 seconds.
461+
465462
{{< note >}}
466463
If the `preStop` hook needs longer to complete than the default grace period allows,
467464
you must modify `terminationGracePeriodSeconds` to suit this.
468465
{{< /note >}}
466+
469467
1. The kubelet triggers the container runtime to send a TERM signal to process 1 inside each
470468
container.
471469
{{< note >}}
472470
The containers in the Pod receive the TERM signal at different times and in an arbitrary
473471
order. If the order of shutdowns matters, consider using a `preStop` hook to synchronize.
474472
{{< /note >}}
475-
1. At the same time as the kubelet is starting graceful shutdown of the Pod, the control plane evaluates whether to remove that shutting-down Pod from EndpointSlice (and Endpoints) objects, where those objects represent
476-
a {{< glossary_tooltip term_id="service" text="Service" >}} with a configured
477-
{{< glossary_tooltip text="selector" term_id="selector" >}}.
473+
474+
1. At the same time as the kubelet is starting graceful shutdown of the Pod, the control plane
475+
evaluates whether to remove that shutting-down Pod from EndpointSlice (and Endpoints) objects,
476+
where those objects represent a {{< glossary_tooltip term_id="service" text="Service" >}}
477+
with a configured {{< glossary_tooltip text="selector" term_id="selector" >}}.
478478
{{< glossary_tooltip text="ReplicaSets" term_id="replica-set" >}} and other workload resources
479-
no longer treat the shutting-down Pod as a valid, in-service replica. Pods that shut down slowly
480-
should not continue to serve regular traffic and should start terminating and finish processing open connections.
481-
Some applications need to go beyond finishing open connections and need more graceful termination -
482-
for example: session draining and completion. Any endpoints that represent the terminating pods
483-
are not immediately removed from EndpointSlices,
484-
and a status indicating [terminating state](/docs/concepts/services-networking/endpoint-slices/#conditions)
485-
is exposed from the EndpointSlice API (and the legacy Endpoints API). Terminating
486-
endpoints always have their `ready` status
487-
as `false` (for backward compatibility with versions before 1.26),
488-
so load balancers will not use it for regular traffic.
489-
If traffic draining on terminating pod is needed, the actual readiness can be checked as a condition `serving`.
490-
You can find more details on how to implement connections draining
491-
in the tutorial [Pods And Endpoints Termination Flow](/docs/tutorials/services/pods-and-endpoint-termination-flow/)
479+
no longer treat the shutting-down Pod as a valid, in-service replica.
480+
481+
Pods that shut down slowly should not continue to serve regular traffic and should start
482+
terminating and finish processing open connections. Some applications need to go beyond
483+
finishing open connections and need more graceful termination, for example, session draining
484+
and completion.
485+
486+
Any endpoints that represent the terminating Pods are not immediately removed from
487+
EndpointSlices, and a status indicating [terminating state](/docs/concepts/services-networking/endpoint-slices/#conditions)
488+
is exposed from the EndpointSlice API (and the legacy Endpoints API).
489+
Terminating endpoints always have their `ready` status as `false` (for backward compatibility
490+
with versions before 1.26), so load balancers will not use it for regular traffic.
491+
492+
If traffic draining on terminating Pod is needed, the actual readiness can be checked as a
493+
condition `serving`. You can find more details on how to implement connections draining in the
494+
tutorial [Pods And Endpoints Termination Flow](/docs/tutorials/services/pods-and-endpoint-termination-flow/)
492495

493496
{{<note>}}
494497
If you don't have the `EndpointSliceTerminatingCondition` feature gate enabled
495-
in your cluster (the gate is on by default from Kubernetes 1.22, and locked to default in 1.26), then the Kubernetes control
496-
plane removes a Pod from any relevant EndpointSlices as soon as the Pod's
498+
in your cluster (the gate is on by default from Kubernetes 1.22, and locked to default in 1.26),
499+
then the Kubernetes control plane removes a Pod from any relevant EndpointSlices as soon as the Pod's
497500
termination grace period _begins_. The behavior above is described when the
498501
feature gate `EndpointSliceTerminatingCondition` is enabled.
499502
{{</note>}}
500503

501504
1. When the grace period expires, the kubelet triggers forcible shutdown. The container runtime sends
502505
`SIGKILL` to any processes still running in any container in the Pod.
503506
The kubelet also cleans up a hidden `pause` container if that container runtime uses one.
504-
1. The kubelet transitions the pod into a terminal phase (`Failed` or `Succeeded` depending on
507+
1. The kubelet transitions the Pod into a terminal phase (`Failed` or `Succeeded` depending on
505508
the end state of its containers). This step is guaranteed since version 1.27.
506509
1. The kubelet triggers forcible removal of Pod object from the API server, by setting grace period
507510
to 0 (immediate deletion).
@@ -518,11 +521,12 @@ the `--grace-period=<seconds>` option which allows you to override the default a
518521
own value.
519522

520523
Setting the grace period to `0` forcibly and immediately deletes the Pod from the API
521-
server. If the pod was still running on a node, that forcible deletion triggers the kubelet to
524+
server. If the Pod was still running on a node, that forcible deletion triggers the kubelet to
522525
begin immediate cleanup.
523526

524527
{{< note >}}
525-
You must specify an additional flag `--force` along with `--grace-period=0` in order to perform force deletions.
528+
You must specify an additional flag `--force` along with `--grace-period=0`
529+
in order to perform force deletions.
526530
{{< /note >}}
527531

528532
When a force deletion is performed, the API server does not wait for confirmation
@@ -532,7 +536,8 @@ name. On the node, Pods that are set to terminate immediately will still be give
532536
a small grace period before being force killed.
533537

534538
{{< caution >}}
535-
Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
539+
Immediate deletion does not wait for confirmation that the running resource has been terminated.
540+
The resource may continue to run on the cluster indefinitely.
536541
{{< /caution >}}
537542

538543
If you need to force-delete Pods that are part of a StatefulSet, refer to the task
@@ -545,21 +550,24 @@ For failed Pods, the API objects remain in the cluster's API until a human or
545550
{{< glossary_tooltip term_id="controller" text="controller" >}} process
546551
explicitly removes them.
547552

548-
The Pod garbage collector (PodGC), which is a controller in the control plane, cleans up terminated Pods (with a phase of `Succeeded` or
549-
`Failed`), when the number of Pods exceeds the configured threshold
550-
(determined by `terminated-pod-gc-threshold` in the kube-controller-manager).
553+
The Pod garbage collector (PodGC), which is a controller in the control plane, cleans up
554+
terminated Pods (with a phase of `Succeeded` or `Failed`), when the number of Pods exceeds the
555+
configured threshold (determined by `terminated-pod-gc-threshold` in the kube-controller-manager).
551556
This avoids a resource leak as Pods are created and terminated over time.
552557

553558
Additionally, PodGC cleans up any Pods which satisfy any of the following conditions:
554-
1. are orphan pods - bound to a node which no longer exists,
555-
2. are unscheduled terminating pods,
556-
3. are terminating pods, bound to a non-ready node tainted with [`node.kubernetes.io/out-of-service`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service), when the `NodeOutOfServiceVolumeDetach` feature gate is enabled.
559+
560+
1. are orphan Pods - bound to a node which no longer exists,
561+
1. are unscheduled terminating Pods,
562+
1. are terminating Pods, bound to a non-ready node tainted with
563+
[`node.kubernetes.io/out-of-service`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service),
564+
when the `NodeOutOfServiceVolumeDetach` feature gate is enabled.
557565

558566
When the `PodDisruptionConditions` feature gate is enabled, along with
559-
cleaning up the pods, PodGC will also mark them as failed if they are in a non-terminal
560-
phase. Also, PodGC adds a pod disruption condition when cleaning up an orphan
561-
pod (see also:
562-
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)).
567+
cleaning up the Pods, PodGC will also mark them as failed if they are in a non-terminal
568+
phase. Also, PodGC adds a Pod disruption condition when cleaning up an orphan Pod.
569+
See [Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)
570+
for more details.
563571

564572
## {{% heading "whatsnext" %}}
565573

@@ -573,4 +581,4 @@ pod (see also:
573581

574582
* For detailed information about Pod and container status in the API, see
575583
the API reference documentation covering
576-
[`.status`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodStatus) for Pod.
584+
[`status`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodStatus) for Pod.

0 commit comments

Comments
 (0)