Document behavior of endpoints with the feature EndpointSliceTerminatingCondition (#36791)

SergeyKanzhelev · Tim Bannister · aojea · web-flow · commit ae17e46f7359 · 2023-03-15T08:32:17.000-07:00
* new behavior of endpoints with the feature gate EndpointSliceTerminatingCondition

* Update content/en/docs/concepts/workloads/pods/pod-lifecycle.md

Co-authored-by: Tim Bannister &lt;tim@scalefactory.com&gt;

* Update content/en/docs/concepts/workloads/pods/pod-lifecycle.md

Co-authored-by: Tim Bannister &lt;tim@scalefactory.com&gt;

* Update content/en/docs/concepts/workloads/pods/pod-lifecycle.md

Co-authored-by: Antonio Ojea &lt;antonio.ojea.garcia@gmail.com&gt;

* fixing feature gate versions

---------

Co-authored-by: Tim Bannister &lt;tim@scalefactory.com&gt;
Co-authored-by: Antonio Ojea &lt;antonio.ojea.garcia@gmail.com&gt;
diff --git a/content/en/docs/concepts/workloads/pods/pod-lifecycle.md b/content/en/docs/concepts/workloads/pods/pod-lifecycle.md
@@ -296,7 +296,7 @@ Each probe must define exactly one of these four mechanisms:
   The target should implement
   [gRPC health checks](https://grpc.io/grpc/core/md_doc_health-checking.html).
   The diagnostic is considered successful if the `status`
-  of the response is `SERVING`.  
+  of the response is `SERVING`.
   gRPC probes are an alpha feature and are only available if you
   enable the `GRPCContainerProbe`
   [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
@@ -465,14 +465,32 @@ An example flow:
       The containers in the Pod receive the TERM signal at different times and in an arbitrary
       order. If the order of shutdowns matters, consider using a `preStop` hook to synchronize.
       {{< /note >}}
-1. At the same time as the kubelet is starting graceful shutdown, the control plane removes that
-   shutting-down Pod from EndpointSlice (and Endpoints) objects where these represent
+1. At the same time as the kubelet is starting graceful shutdown of the Pod, the control plane evaluates whether to remove that shutting-down Pod from EndpointSlice (and Endpoints) objects, where those objects represent
    a {{< glossary_tooltip term_id="service" text="Service" >}} with a configured
    {{< glossary_tooltip text="selector" term_id="selector" >}}.
    {{< glossary_tooltip text="ReplicaSets" term_id="replica-set" >}} and other workload resources
    no longer treat the shutting-down Pod as a valid, in-service replica. Pods that shut down slowly
-   cannot continue to serve traffic as load balancers (like the service proxy) remove the Pod from
-   the list of endpoints as soon as the termination grace period _begins_.
+   should not continue to serve regular traffic and should start terminating and finish processing open connections.
+   Some applications need to go beyond finishing open connections and need more graceful termination -
+   for example: session draining and completion. Any endpoints that represent the terminating pods
+   are not immediately removed from EndpointSlices,
+   and a status indicating [terminating state](/docs/concepts/services-networking/endpoint-slices/#conditions)
+   is exposed from the EndpointSlice API (and the legacy Endpoints API). Terminating
+   endpoints always have their `ready` status
+   as `false` (for backward compatibility with versions before 1.26),
+   so load balancers will not use it for regular traffic.
+   If traffic draining on terminating pod is needed, the actual readiness can be checked as a condition `serving`.
+   You can find more details on how to implement connections draining
+   in the tutorial [Pods And Endpoints Termination Flow](/docs/tutorials/services/pods-and-endpoint-termination-flow/)
+
+{{<note>}}
+If you don't have the `EndpointSliceTerminatingCondition` feature gate enabled
+in your cluster (the gate is on by default from Kubernetes 1.22, and locked to default in 1.26), then the Kubernetes control
+plane removes a Pod from any relevant EndpointSlices as soon as the Pod's
+termination grace period _begins_. The behavior above is described when the
+feature gate `EndpointSliceTerminatingCondition` is enabled.
+{{</note>}}
+
 1. When the grace period expires, the kubelet triggers forcible shutdown. The container runtime sends
    `SIGKILL` to any processes still running in any container in the Pod.
    The kubelet also cleans up a hidden `pause` container if that container runtime uses one.
diff --git a/content/en/docs/tutorials/services/pods-and-endpoint-termination-flow.md b/content/en/docs/tutorials/services/pods-and-endpoint-termination-flow.md
@@ -0,0 +1,221 @@
+---
+title: Explore Termination Behavior for Pods And Their Endpoints
+content_type: tutorial
+weight: 60
+---
+
+
+<!-- overview -->
+
+Once you connected your Application with Service following steps
+like those outlined in [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/),
+you have a continuously running, replicated application, that is exposed on a network.
+This tutorial helps you look at the termination flow for Pods and to explore ways to implement
+graceful connection draining.
+
+<!-- body -->
+
+## Termination process for Pods and their endpoints
+
+There are often cases when you need to terminate a Pod - be it for upgrade or scale down.
+In order to improve application availability, it may be important to implement
+a proper active connections draining. This tutorial explains the flow of
+Pod termination in connection with the corresponding endpoint state and removal.
+
+This tutorial explains the flow of Pod termination in connection with the
+corresponding endpoint state and removal by using
+a simple nginx web server to demonstrate the concept.
+
+<!-- body -->
+
+## Example flow with endpoint termination
+
+The following is the example of the flow described in the
+[Termination of Pods](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
+document.
+
+Let's say you have a Deployment containing of a single `nginx` replica
+(just for demonstration purposes) and a Service:
+
+{{< codenew file="service/pod-with-graceful-termination.yaml" >}}
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: nginx-deployment
+  labels:
+    app: nginx
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: nginx
+  template:
+    metadata:
+      labels:
+        app: nginx
+    spec:
+      terminationGracePeriodSeconds: 120 # extra long grace period
+      containers:
+      - name: nginx
+        image: nginx:latest
+        ports:
+        - containerPort: 80
+        lifecycle:
+          preStop:
+            exec:
+              # Real life termination may take any time up to terminationGracePeriodSeconds.
+              # In this example - just hang around for at least the duration of terminationGracePeriodSeconds,
+              # at 120 seconds container will be forcibly terminated.
+              # Note, all this time nginx will keep processing requests.
+              command: [
+                "/bin/sh", "-c", "sleep 180"
+              ]
+
+---
+
+apiVersion: v1
+kind: Service
+metadata:
+  name: nginx-service
+spec:
+  selector:
+    app: nginx
+  ports:
+    - protocol: TCP
+      port: 80
+      targetPort: 80
+```
+
+Once the Pod and Service are running, you can get the name of any associated EndpointSlices:
+
+```shell
+kubectl get endpointslice
+```
+
+The output is similar to this:
+
+```none
+NAME                  ADDRESSTYPE   PORTS   ENDPOINTS                 AGE
+nginx-service-6tjbr   IPv4          80      10.12.1.199,10.12.1.201   22m
+```
+
+You can see its status, and validate that there is one endpoint registered:
+
+```shell
+kubectl get endpointslices -o json -l kubernetes.io/service-name=nginx-service
+```
+
+The output is similar to this:
+
+```none
+{
+    "addressType": "IPv4",
+    "apiVersion": "discovery.k8s.io/v1",
+    "endpoints": [
+        {
+            "addresses": [
+                "10.12.1.201"
+            ],
+            "conditions": {
+                "ready": true,
+                "serving": true,
+                "terminating": false
+```
+
+Now let's terminate the Pod and validate that the Pod is being terminated
+respecting the graceful termination period configuration:
+
+```shell
+kubectl delete pod nginx-deployment-7768647bf9-b4b9s
+```
+
+All pods:
+
+```shell
+kubectl get pods
+```
+
+The output is similar to this:
+
+```none
+NAME                                READY   STATUS        RESTARTS      AGE
+nginx-deployment-7768647bf9-b4b9s   1/1     Terminating   0             4m1s
+nginx-deployment-7768647bf9-rkxlw   1/1     Running       0             8s
+```
+
+You can see that the new pod got scheduled.
+
+While the new endpoint is being created for the new Pod, the old endpoint is
+still around in the terminating state:
+
+```shell
+kubectl get endpointslice -o json nginx-service-6tjbr
+```
+
+The output is similar to this:
+
+```none
+{
+    "addressType": "IPv4",
+    "apiVersion": "discovery.k8s.io/v1",
+    "endpoints": [
+        {
+            "addresses": [
+                "10.12.1.201"
+            ],
+            "conditions": {
+                "ready": false,
+                "serving": true,
+                "terminating": true
+            },
+            "nodeName": "gke-main-default-pool-dca1511c-d17b",
+            "targetRef": {
+                "kind": "Pod",
+                "name": "nginx-deployment-7768647bf9-b4b9s",
+                "namespace": "default",
+                "uid": "66fa831c-7eb2-407f-bd2c-f96dfe841478"
+            },
+            "zone": "us-central1-c"
+        },
+        {
+            "addresses": [
+                "10.12.1.202"
+            ],
+            "conditions": {
+                "ready": true,
+                "serving": true,
+                "terminating": false
+            },
+            "nodeName": "gke-main-default-pool-dca1511c-d17b",
+            "targetRef": {
+                "kind": "Pod",
+                "name": "nginx-deployment-7768647bf9-rkxlw",
+                "namespace": "default",
+                "uid": "722b1cbe-dcd7-4ed4-8928-4a4d0e2bbe35"
+            },
+            "zone": "us-central1-c"
+```
+
+This allows applications to communicate their state during termination
+and clients (such as load balancers) to implement a connections draining functionality.
+These clients may detect terminating endpoints and implement a special logic for them.
+
+In Kubernetes, endpoints that are terminating always have their `ready` status set as as `false`.
+This needs to happen for backward
+compatibility, so existing load balancers will not use it for regular traffic.
+If traffic draining on terminating pod is needed, the actual readiness can be
+checked as a condition `serving`.
+
+When Pod is deleted, the old endpoint will also be deleted.
+
+
+## {{% heading "whatsnext" %}}
+
+
+* Learn how to [Connect Applications with Services](/docs/tutorials/services/connect-applications-service/)
+* Learn more about [Using a Service to Access an Application in a Cluster](/docs/tasks/access-application-cluster/service-access-application-cluster/)
+* Learn more about [Connecting a Front End to a Back End Using a Service](/docs/tasks/access-application-cluster/connecting-frontend-backend/)
+* Learn more about [Creating an External Load Balancer](/docs/tasks/access-application-cluster/create-external-load-balancer/)
+
diff --git a/content/en/examples/service/pod-with-graceful-termination.yaml b/content/en/examples/service/pod-with-graceful-termination.yaml
@@ -0,0 +1,32 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: nginx-deployment
+  labels:
+    app: nginx
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: nginx
+  template:
+    metadata:
+      labels:
+        app: nginx
+    spec:
+      terminationGracePeriodSeconds: 120 # extra long grace period
+      containers:
+      - name: nginx
+        image: nginx:latest
+        ports:
+        - containerPort: 80
+        lifecycle:
+          preStop:
+            exec:
+              # Real life termination may take any time up to terminationGracePeriodSeconds.
+              # In this example - just hang around for at least the duration of terminationGracePeriodSeconds,
+              # at 120 seconds container will be forcibly terminated.
+              # Note, all this time nginx will keep processing requests.
+              command: [
+                "/bin/sh", "-c", "sleep 180"
+              ]