You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/3960-pod-lifecycle-sleep-action/README.md
+44-17Lines changed: 44 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,16 +45,16 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
45
45
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
46
46
-[x] (R) KEP approvers have approved the KEP status as `implementable`
47
47
-[x] (R) Design details are appropriately documented
48
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
49
-
-[] e2e Tests for all Beta API Operations (endpoints)
48
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
49
+
-[x] e2e Tests for all Beta API Operations (endpoints)
50
50
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
51
51
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
52
-
-[] (R) Graduation criteria is in place
52
+
-[x] (R) Graduation criteria is in place
53
53
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
54
54
-[x] (R) Production readiness review completed
55
55
-[x] (R) Production readiness review approved
56
-
-[] "Implementation History" section is up-to-date for milestone
57
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
56
+
-[x] "Implementation History" section is up-to-date for milestone
57
+
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
58
58
-[ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
59
59
60
60
<!--
@@ -212,12 +212,24 @@ to implement this enhancement.
212
212
213
213
##### Unit tests
214
214
215
+
alpha:
215
216
- Test that the runSleepHandler function sleeps for the correct duration when given a valid duration value.
216
217
- Test that the runSleepHandler function returns without error when given a valid duration value.
217
218
- Test that the validation returns an error when given an invalid duration value (e.g., a negative value).
218
219
- Test that the validation returns an error when given duration is longer than the termination graceperiod.
219
220
- Test that the runSleepHandler function returns immediately when given a duration of zero.
220
221
222
+
beta:
223
+
- Test the `switch` of the feature-gate itself.
224
+
- Test the handler is silently dropped when a pod created with feature-gate disabled.
225
+
- Test the handler is correctly added when a pod created with feature-gate enabled.
226
+
- Test the handler is silently dropped when a pod created with no handler and feature-gate enabled is updated with handler and feature-gate disabled.
227
+
- Test the handler is correctly added when a pod created with no handler and feature-gate disabled is updated with handler and feature-gate enabled.
- [x]Interaction with termination grace period(alpha)
273
+
- []Sleep duration boundary testing(beta)
274
+
- []Container exit/crash testing(beta)
255
275
### Graduation Criteria
256
276
257
277
#### Alpha
@@ -297,12 +317,6 @@ If only the kubelet enable this feature, when creating/updating a resource with
297
317
- [x] Feature gate (also fill in values in `kep.yaml`)
298
318
- Feature gate name: PodLifecycleSleepAction
299
319
- Components depending on the feature gate: kubelet,kube-apiserver
300
-
- [ ] Other
301
-
- Describe the mechanism:
302
-
- Will enabling / disabling the feature require downtime of the control
303
-
plane?
304
-
- Will enabling / disabling the feature require downtime or reprovisioning
305
-
of a node?
306
320
307
321
###### Does enabling the feature change any default behavior?
308
322
@@ -320,8 +334,8 @@ New pods with sleep action in prestop hook can be created.
320
334
Previously created pod with sleep hook set will execute it before terminating.
321
335
322
336
###### Are there any tests for feature enablement/disablement?
323
-
324
-
Yes. Some unit tests will be designed to test the verification process of the "sleep" field under different scenarios, such as when the feature is enabled, disabled, or switched. These tests will be included in the alpha version.
337
+
For alpha, the `switch` of feature gate is tested manually.
338
+
For beta, unit tests for the `switch` of feature gate itself will be added in `pkg/registry/core/pod/strategy_test`.
325
339
326
340
### Rollout, Upgrade and Rollback Planning
327
341
@@ -331,8 +345,21 @@ The change is opt-in, it doesn't impact already running workloads.
331
345
332
346
###### What specific metrics should inform a rollback?
333
347
348
+
Metric `sleep_action_terminated_early_total` will be added in beta.
349
+
If it increases unreasonably, then user should check if something goes wrong and may need a rollback.
350
+
334
351
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
335
352
353
+
This is an opt-in feature, and it does not change any default behavior. We manually tested enabling and disabling this feature by changing kubelet and kube-api-server config and restarting them.
354
+
355
+
The manual test steps are as following:
356
+
357
+
1. Create a local 1.29 k8s cluster, and create a test-pod in that cluster.
358
+
2. Enable PodLifecycleSleepAction feature in kubelet and kube-api-server and restart both.
359
+
3. Add a prestop hook with sleep action to the test-pod and delete it, observe the time cost.
360
+
4. Create another pod with sleep action.
361
+
5. Disable PodLifecycleSleepAction feature in kubelet and kube-api-server and restart both.
362
+
6. Delete the pod created in step 4, and observe the time cost.
336
363
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
337
364
338
365
No
@@ -359,10 +386,9 @@ N/A
359
386
360
387
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
361
388
362
-
- [] Metrics
389
+
- [x] Metrics
363
390
- Metric name:
364
-
- [Optional] Aggregation method:
365
-
- Components exposing the metric:
391
+
- sleep_action_terminated_early_total(counts the number of Pods got terminated before sleep action finishes)
366
392
- [x] Other (treat as last resort)
367
393
- Details: Check the logs of the container during termination, check the termination duration.
368
394
@@ -422,11 +448,12 @@ N/A
422
448
423
449
###### What steps should be taken if SLOs are not being met to determine the problem?
424
450
425
-
N/A
451
+
Disable PodLifecycleSleepAction feature gate, and restart related components.
0 commit comments