You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/4438-sidecar-restart-termination/README.md
+69-8Lines changed: 69 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,17 +131,17 @@ checklist items _must_ be updated for the enhancement to be released.
131
131
132
132
Items marked with (R) are required *prior to targeting to a milestone / release*.
133
133
134
-
-[] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
135
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
134
+
-[X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
135
+
-[X] (R) KEP approvers have approved the KEP status as `implementable`
136
136
-[ ] (R) Design details are appropriately documented
137
137
-[ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
138
138
-[ ] e2e Tests for all Beta API Operations (endpoints)
139
139
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
140
140
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
141
-
-[] (R) Graduation criteria is in place
141
+
-[X] (R) Graduation criteria is in place
142
142
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
143
-
-[] (R) Production readiness review completed
144
-
-[] (R) Production readiness review approved
143
+
-[X] (R) Production readiness review completed
144
+
-[X] (R) Production readiness review approved
145
145
-[ ] "Implementation History" section is up-to-date for milestone
146
146
-[ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
147
147
-[ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -225,6 +225,12 @@ implementation. What is the desired outcome and how do we measure success?.
225
225
The "Design Details" section below is for the real
226
226
nitty-gritty.
227
227
-->
228
+
The proposal is to introduce a new feature gate for the sidecar containers KEP to decouple the sidecar feature
229
+
from the restart during Pod termination feature and allow users to use sidecar containers without the refactoring
230
+
required for the restart during Pod termination.
231
+
232
+
Please refer to the original KEP for the details of the sidecar containers feature:
We expect no non-infra related flakes in the last month as a GA graduation criteria.
353
-
-->
354
359
355
360
- <test>: <link to test coverage>
361
+
-->
362
+
363
+
###### Existing tests
364
+
365
+
- should respect termination grace period seconds
366
+
- should respect termination grace period seconds with long-running preStop hook https://github.com/kubernetes/kubernetes/blob/fbb2e6293fb0c8c107ae48b8b8ae488325c59598/test/e2e_node/container_lifecycle_test.go#L536
367
+
- should call the container's preStop hook and terminate it if its startup probe fails https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/container_lifecycle_test.go#L616
368
+
- should call the container's preStop hook and terminate it if its liveness probe fails https://github.com/kubernetes/kubernetes/blob/fbb2e6293fb0c8c107ae48b8b8ae488325c59598/test/e2e_node/container_lifecycle_test.go#L683
369
+
370
+
###### New tests
371
+
372
+
Probes:
373
+
- Readiness probes are still running while in preStop
374
+
- Readiness status is beings updated for the container and the Pod while in preStop
375
+
- Liveness probes are NOT running for regular containers while the Pod is terminating
376
+
- SIDECAR: Liveness probes DO run for sidecar containers while the Pod is terminating
377
+
- SIDECAR: sidecar container will be restarted when liveness probe failed during Pod termination
378
+
379
+
Not fully started containers:
380
+
- preStop will not be executed for the container that hasn’t started yet
381
+
- preStop will be called on the container even if postStart is still running
382
+
- postStart hook CONTINUE EXECUTE even if container started termination
383
+
- postStart hook will stop once pod passed it’s graceful termination period
384
+
385
+
Pod with some containers terminated:
386
+
- BUGFIX, SIDECAR: Container can be restarted when there are terminating containers in the Pod A container cannot restart when there is any terminating container in the same pod · Issue #121398
387
+
388
+
Re-terminating the Pod:
389
+
- When the Pod is terminating, another call to terminate the pod with the smaller grace period will override the grace period to terminate Pod faster
390
+
- When the Pod is terminating, another call to terminate the pod with the greater grace period will override the grace period to allow longer termination
391
+
- BUGFIX: Service account token gets invalidated while terminating pod is re-deleted · Issue #122568
392
+
393
+
Pre-stop vs. SIGTERM traps:
394
+
- Same as existing and above tests, need to validate that the container that traps the SIGTERM behaves the same way as with preStop:
395
+
- Respect the grace period
396
+
- Liveness probes are not running
397
+
- Readiness probes are running
398
+
399
+
Test what is available for during preStop:
400
+
- BUGFIX: While the Pod is terminating, service account tokens are rotated Kubelet stops rotating service account tokens when pod is terminating, breaking preStop hooks · Issue #116481
401
+
- BUGFIX: Service account token is valid if the terminating Pod was deleted again Service account token gets invalidated while terminating pod is re-deleted · Issue #122568
402
+
403
+
Eviction and OOM kills:
404
+
- preStop is called when Pod is evicted
405
+
- preStop is NOT called when Container is OOMkilled
356
406
357
407
### Graduation Criteria
358
408
@@ -445,6 +495,9 @@ enhancement:
445
495
- What changes (in invocations, configurations, API use, etc.) is an existing
446
496
cluster required to make on upgrade, in order to make use of the enhancement?
447
497
-->
498
+
This feature only concerns the kubelet, so the upgrade and downgrade strategy is limited to the kubelet.
499
+
Moreover, the Pod spec is not altered, so no changes are required for existing workloads to make use of the feature.
500
+
Likewise, no changes are required for these workloads to revert to previous behavior.
448
501
449
502
### Version Skew Strategy
450
503
@@ -460,6 +513,8 @@ enhancement:
460
513
- Will any other components on the node change? For example, changes to CSI,
461
514
CRI or CNI may require updating that component before the kubelet.
462
515
-->
516
+
There is no version skew strategy for this feature.
517
+
The kubelet is the only component that needs to be updated to make use of this feature.
463
518
464
519
## Production Readiness Review Questionnaire
465
520
@@ -537,7 +592,8 @@ feature.
537
592
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
538
593
-->
539
594
Yes, the feature can be disabled once it has been enabled.
540
-
There is no alteration to the Pod spec, so existing workloads will not be affected.
595
+
There is no alteration to the Pod spec, so existing workloads will be terminated according to the current behavior,
596
+
after the kubelet is restarted with the feature gate disabled.
541
597
542
598
###### What happens if we reenable the feature if it was previously rolled back?
543
599
No side effect, the feature can be switched on or off.
@@ -557,6 +613,7 @@ You can take a look at one potential example of such test in:
0 commit comments