Skip to content

Commit ef5b24c

Browse files
committed
KEP 4438: make Implementable, prepare PRR
Signed-off-by: Matthias Bertschy <[email protected]>
1 parent 91360e2 commit ef5b24c

File tree

3 files changed

+74
-10
lines changed

3 files changed

+74
-10
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 4438
2+
alpha:
3+
approver: "@wojtek-t"

keps/sig-node/4438-sidecar-restart-termination/README.md

Lines changed: 69 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -131,17 +131,17 @@ checklist items _must_ be updated for the enhancement to be released.
131131

132132
Items marked with (R) are required *prior to targeting to a milestone / release*.
133133

134-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
135-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
134+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
135+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
136136
- [ ] (R) Design details are appropriately documented
137137
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
138138
- [ ] e2e Tests for all Beta API Operations (endpoints)
139139
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
140140
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
141-
- [ ] (R) Graduation criteria is in place
141+
- [X] (R) Graduation criteria is in place
142142
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
143-
- [ ] (R) Production readiness review completed
144-
- [ ] (R) Production readiness review approved
143+
- [X] (R) Production readiness review completed
144+
- [X] (R) Production readiness review approved
145145
- [ ] "Implementation History" section is up-to-date for milestone
146146
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
147147
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -225,6 +225,12 @@ implementation. What is the desired outcome and how do we measure success?.
225225
The "Design Details" section below is for the real
226226
nitty-gritty.
227227
-->
228+
The proposal is to introduce a new feature gate for the sidecar containers KEP to decouple the sidecar feature
229+
from the restart during Pod termination feature and allow users to use sidecar containers without the refactoring
230+
required for the restart during Pod termination.
231+
232+
Please refer to the original KEP for the details of the sidecar containers feature:
233+
https://git.k8s.io/enhancements/keps/sig-node/753-sidecar-containers
228234

229235
### User Stories (Optional)
230236

@@ -319,7 +325,7 @@ This can inform certain test coverage improvements that we want to do before
319325
extending the production code to implement this enhancement.
320326
-->
321327

322-
- `<package>`: `<date>` - `<test coverage>`
328+
- `pkg/kubelet/kuberuntime`: `2024/02/06` - `66.8%`
323329

324330
##### Integration tests
325331

@@ -350,9 +356,53 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
350356
https://storage.googleapis.com/k8s-triage/index.html
351357
352358
We expect no non-infra related flakes in the last month as a GA graduation criteria.
353-
-->
354359
355360
- <test>: <link to test coverage>
361+
-->
362+
363+
###### Existing tests
364+
365+
- should respect termination grace period seconds
366+
- should respect termination grace period seconds with long-running preStop hook https://github.com/kubernetes/kubernetes/blob/fbb2e6293fb0c8c107ae48b8b8ae488325c59598/test/e2e_node/container_lifecycle_test.go#L536
367+
- should call the container's preStop hook and terminate it if its startup probe fails https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/container_lifecycle_test.go#L616
368+
- should call the container's preStop hook and terminate it if its liveness probe fails https://github.com/kubernetes/kubernetes/blob/fbb2e6293fb0c8c107ae48b8b8ae488325c59598/test/e2e_node/container_lifecycle_test.go#L683
369+
370+
###### New tests
371+
372+
Probes:
373+
- Readiness probes are still running while in preStop
374+
- Readiness status is beings updated for the container and the Pod while in preStop
375+
- Liveness probes are NOT running for regular containers while the Pod is terminating
376+
- SIDECAR: Liveness probes DO run for sidecar containers while the Pod is terminating
377+
- SIDECAR: sidecar container will be restarted when liveness probe failed during Pod termination
378+
379+
Not fully started containers:
380+
- preStop will not be executed for the container that hasn’t started yet
381+
- preStop will be called on the container even if postStart is still running
382+
- postStart hook CONTINUE EXECUTE even if container started termination
383+
- postStart hook will stop once pod passed it’s graceful termination period
384+
385+
Pod with some containers terminated:
386+
- BUGFIX, SIDECAR: Container can be restarted when there are terminating containers in the Pod A container cannot restart when there is any terminating container in the same pod · Issue #121398
387+
388+
Re-terminating the Pod:
389+
- When the Pod is terminating, another call to terminate the pod with the smaller grace period will override the grace period to terminate Pod faster
390+
- When the Pod is terminating, another call to terminate the pod with the greater grace period will override the grace period to allow longer termination
391+
- BUGFIX: Service account token gets invalidated while terminating pod is re-deleted · Issue #122568
392+
393+
Pre-stop vs. SIGTERM traps:
394+
- Same as existing and above tests, need to validate that the container that traps the SIGTERM behaves the same way as with preStop:
395+
- Respect the grace period
396+
- Liveness probes are not running
397+
- Readiness probes are running
398+
399+
Test what is available for during preStop:
400+
- BUGFIX: While the Pod is terminating, service account tokens are rotated Kubelet stops rotating service account tokens when pod is terminating, breaking preStop hooks · Issue #116481
401+
- BUGFIX: Service account token is valid if the terminating Pod was deleted again Service account token gets invalidated while terminating pod is re-deleted · Issue #122568
402+
403+
Eviction and OOM kills:
404+
- preStop is called when Pod is evicted
405+
- preStop is NOT called when Container is OOMkilled
356406

357407
### Graduation Criteria
358408

@@ -445,6 +495,9 @@ enhancement:
445495
- What changes (in invocations, configurations, API use, etc.) is an existing
446496
cluster required to make on upgrade, in order to make use of the enhancement?
447497
-->
498+
This feature only concerns the kubelet, so the upgrade and downgrade strategy is limited to the kubelet.
499+
Moreover, the Pod spec is not altered, so no changes are required for existing workloads to make use of the feature.
500+
Likewise, no changes are required for these workloads to revert to previous behavior.
448501

449502
### Version Skew Strategy
450503

@@ -460,6 +513,8 @@ enhancement:
460513
- Will any other components on the node change? For example, changes to CSI,
461514
CRI or CNI may require updating that component before the kubelet.
462515
-->
516+
There is no version skew strategy for this feature.
517+
The kubelet is the only component that needs to be updated to make use of this feature.
463518

464519
## Production Readiness Review Questionnaire
465520

@@ -537,7 +592,8 @@ feature.
537592
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
538593
-->
539594
Yes, the feature can be disabled once it has been enabled.
540-
There is no alteration to the Pod spec, so existing workloads will not be affected.
595+
There is no alteration to the Pod spec, so existing workloads will be terminated according to the current behavior,
596+
after the kubelet is restarted with the feature gate disabled.
541597

542598
###### What happens if we reenable the feature if it was previously rolled back?
543599
No side effect, the feature can be switched on or off.
@@ -557,6 +613,7 @@ You can take a look at one potential example of such test in:
557613
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
558614
-->
559615
Yes, unit tests will be added to ensure the feature can be enabled and disabled.
616+
The KEP will be updated with the details of the tests as they are added.
560617

561618
### Rollout, Upgrade and Rollback Planning
562619

@@ -826,6 +883,10 @@ Major milestones might include:
826883
- when the KEP was retired or superseded
827884
-->
828885

886+
- 2024-01-30: `Summary` and `Motivation` sections merged
887+
- 2024-02-08: `Proposal` section merged, KEP marked as `implementable`
888+
- v1.30: Alpha release
889+
829890
## Drawbacks
830891

831892
<!--

keps/sig-node/4438-sidecar-restart-termination/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ authors:
66
owning-sig: sig-node
77
participating-sigs:
88
- sig-scheduler
9-
status: provisional
9+
status: implementable
1010
creation-date: 2024-01-25
11-
last-updated: 2024-01-25
11+
last-updated: 2024-02-02
1212
reviewers:
1313
- "@mrunalp" # overall
1414
- "@ahg-g" # SIG Scheduling

0 commit comments

Comments
 (0)