|
1 |
| -# Non graceful node shutdown |
2 |
| - |
3 |
| -This includes the Summary and Motivation sections. |
| 1 | +# KEP-2268: Non graceful node shutdown |
4 | 2 |
|
5 | 3 | ## Table of Contents
|
6 | 4 |
|
@@ -41,20 +39,20 @@ This includes the Summary and Motivation sections.
|
41 | 39 | ## Release Signoff Checklist
|
42 | 40 |
|
43 | 41 | Items marked with (R) are required *prior to targeting to a milestone / release*.
|
44 |
| -- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
45 |
| -- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
46 |
| -- [ ] (R) Design details are appropriately documented |
47 |
| -- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
48 |
| - - [ ] e2e Tests for all Beta API Operations (endpoints) |
| 42 | +- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 43 | +- [X] (R) KEP approvers have approved the KEP status as `implementable` |
| 44 | +- [X] (R) Design details are appropriately documented |
| 45 | +- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 46 | + - [X] e2e Tests for all Beta API Operations (endpoints) |
49 | 47 | - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
|
50 | 48 | - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
|
51 |
| -- [ ] (R) Graduation criteria is in place |
| 49 | +- [X] (R) Graduation criteria is in place |
52 | 50 | - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
|
53 |
| -- [ ] (R) Production readiness review completed |
54 |
| -- [ ] (R) Production readiness review approved |
55 |
| -- [ ] "Implementation History" section is up-to-date for milestone |
56 |
| -- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
57 |
| -- [ ] Supporting documentation - e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 51 | +- [X] (R) Production readiness review completed |
| 52 | +- [X] (R) Production readiness review approved |
| 53 | +- [X] "Implementation History" section is up-to-date for milestone |
| 54 | +- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 55 | +- [X] Supporting documentation - e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
58 | 56 |
|
59 | 57 | **Note:** Any PRs to move a KEP to `implementable` or significant changes once it is marked `implementable` should be approved by each of the KEP approvers. If any of those
|
60 | 58 | approvers is no longer appropriate than changes to that list should be approved by the remaining approvers and/or the owning SIG (or SIG-arch for cross cutting KEPs).
|
@@ -146,7 +144,7 @@ To mitigate this we plan to have a high test coverage and to introduce this enha
|
146 | 144 |
|
147 | 145 | ### Test Plan
|
148 | 146 |
|
149 |
| -[x] I/we understand the owners of the involved components may require updates to |
| 147 | +[X] I/we understand the owners of the involved components may require updates to |
150 | 148 | existing tests to make this code solid enough prior to committing the changes necessary
|
151 | 149 | to implement this enhancement.
|
152 | 150 |
|
@@ -386,20 +384,38 @@ logs or events for this purpose.
|
386 | 384 | The usage of this feature requires the manual step of applying a taint
|
387 | 385 | so the operator should be the one applying it.
|
388 | 386 |
|
| 387 | +###### How can someone using this feature know that it is working for their instance? |
| 388 | + |
| 389 | +<!-- |
| 390 | +For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly |
| 391 | +for each individual pod. |
| 392 | +Pick one more of these and delete the rest. |
| 393 | +Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement |
| 394 | +and operation of this feature. |
| 395 | +Recall that end users cannot usually observe component logs or access metrics. |
| 396 | +--> |
| 397 | + |
| 398 | +- [X] API .status |
| 399 | + If it works, pods in the stateful workload should be re-scheduled to another |
| 400 | + running node. `Phase` in Pod `Status` should be `Running` for a new Pod |
| 401 | + on the other running node. |
| 402 | + If not, check the pod status to see why it does not come up. |
| 403 | + |
389 | 404 | ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
|
390 | 405 |
|
391 | 406 | <!--
|
392 | 407 | Pick one more of these and delete the rest.
|
393 | 408 | -->
|
394 | 409 | - [X] Metrics
|
395 | 410 | - Metric name:
|
396 |
| - - We can add new metrics `deleting_pods_total`, `deleting_pods_error_total` |
397 |
| - in Pod GC Controller. |
398 |
| - For Attach Detach Controller, there's already a metric: |
399 |
| - attachdetach_controller_forced_detaches |
400 |
| - It is also useful to know how many nodes have taints. We can explore with [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) which generates metrics about the state of the objects. |
| 411 | + - New metrics are added in Pod GC Controller: |
| 412 | + - `force_delete_pods_total{reason="out-of-service|terminated|orphaned|unscheduled"}`, the number of pods that are being forcefully deleted since the Pod GC Controller started. |
| 413 | + - `force_delete_pod_errors_total{reason="out-of-service|terminated|orphaned|unscheduled"}`, the number of errors encountered when forcefully deleting the pods since the Pod GC Controller started. |
| 414 | + - For Attach Detach Controller, the following metric will be recorded if a force detach is performed because the node has the `out-of-service` taint or a timeout happens: |
| 415 | + - `attachdetach_controller_forced_detaches{reason="out-of-service|timeout"}`, the number of times the Attach Detach Controller performed a forced detach. |
| 416 | + - There is also a `kube_node_spec_taint` in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/node-metrics.md) that is a metric for the taint of a Kubernetes cluster node. |
401 | 417 | - [Optional] Aggregation method:
|
402 |
| - - Components exposing the metric: |
| 418 | + - Components exposing the metric: kube-controller-manager |
403 | 419 | - [X] Other (treat as last resort)
|
404 | 420 | - Details:
|
405 | 421 | - Check whether the workload moved to a different running node
|
@@ -490,6 +506,13 @@ For GA, this section is required: approvers should be able to confirm the
|
490 | 506 | previous answers based on experience in the field.
|
491 | 507 | -->
|
492 | 508 |
|
| 509 | +Without this feature, a user can forcefully delete the pods after they are |
| 510 | +in terminating state and new pods will be re-scheduled to another running |
| 511 | +node after 6 minutes. With this feature, new pods will be re-scheduled to |
| 512 | +another running node without the 6 minute wait after the user has applied |
| 513 | +the `out-of-service` taint. It speeds up the failover but should not |
| 514 | +affect the scalability. |
| 515 | + |
493 | 516 | ###### Will enabling / using this feature result in any new API calls?
|
494 | 517 |
|
495 | 518 | <!--
|
@@ -560,6 +583,19 @@ This through this both in small and large cases, again with respect to the
|
560 | 583 | -->
|
561 | 584 | No.
|
562 | 585 |
|
| 586 | +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? |
| 587 | + |
| 588 | +<!-- |
| 589 | +Focus not just on happy cases, but primarily on more pathological cases |
| 590 | +(e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.). |
| 591 | +If any of the resources can be exhausted, how this is mitigated with the existing limits |
| 592 | +(e.g. pods per node) or new limits added by this KEP? |
| 593 | +
|
| 594 | +Are there any tests that were run/should be run to understand performance characteristics better |
| 595 | +and validate the declared limits? |
| 596 | +--> |
| 597 | +No. |
| 598 | + |
563 | 599 | ### Troubleshooting
|
564 | 600 |
|
565 | 601 | <!--
|
@@ -648,6 +684,9 @@ For each of them, fill in the following information by copying the below templat
|
648 | 684 | - 2020-11-10: KEP updated to handle part of the node partitioning
|
649 | 685 | - 2021-08-26: The scope of the KEP is narrowed down to handle a real node shutdown. Test plan is updated. Node partitioning will be handled in the future and it can be built on top of this design.
|
650 | 686 | - 2021-12-03: Removed `SafeDetach` flag. Requires a user to add the `out-of-service` taint when he/she knows the node is shutdown.
|
| 687 | +- Kubernetes v1.24: moved to alpha. |
| 688 | +- Kubernete v1.26: moved to beta. |
| 689 | +- Kubernete v1.28: moved to stable. |
651 | 690 |
|
652 | 691 | ## Alternatives
|
653 | 692 |
|
|
0 commit comments