Skip to content

Commit fd4b66a

Browse files
committed
target KEP 3960 to beta
1 parent 27ef0d9 commit fd4b66a

File tree

3 files changed

+50
-20
lines changed

3 files changed

+50
-20
lines changed

keps/prod-readiness/sig-node/3960.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
kep-number: 3960
55
alpha:
66
approver: "@wojtek-t"
7+
beta:
8+
approver: "@wojtek-t"

keps/sig-node/3960-pod-lifecycle-sleep-action/README.md

Lines changed: 44 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -45,16 +45,16 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
4545
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
4646
- [x] (R) KEP approvers have approved the KEP status as `implementable`
4747
- [x] (R) Design details are appropriately documented
48-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
49-
- [ ] e2e Tests for all Beta API Operations (endpoints)
48+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
49+
- [x] e2e Tests for all Beta API Operations (endpoints)
5050
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
5151
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
52-
- [ ] (R) Graduation criteria is in place
52+
- [x] (R) Graduation criteria is in place
5353
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
5454
- [x] (R) Production readiness review completed
5555
- [x] (R) Production readiness review approved
56-
- [ ] "Implementation History" section is up-to-date for milestone
57-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
56+
- [x] "Implementation History" section is up-to-date for milestone
57+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
5858
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5959

6060
<!--
@@ -212,12 +212,24 @@ to implement this enhancement.
212212

213213
##### Unit tests
214214

215+
alpha:
215216
- Test that the runSleepHandler function sleeps for the correct duration when given a valid duration value.
216217
- Test that the runSleepHandler function returns without error when given a valid duration value.
217218
- Test that the validation returns an error when given an invalid duration value (e.g., a negative value).
218219
- Test that the validation returns an error when given duration is longer than the termination graceperiod.
219220
- Test that the runSleepHandler function returns immediately when given a duration of zero.
220221

222+
beta:
223+
- Test the `switch` of the feature-gate itself.
224+
- Test the handler is silently dropped when a pod created with feature-gate disabled.
225+
- Test the handler is correctly added when a pod created with feature-gate enabled.
226+
- Test the handler is silently dropped when a pod created with no handler and feature-gate enabled is updated with handler and feature-gate disabled.
227+
- Test the handler is correctly added when a pod created with no handler and feature-gate disabled is updated with handler and feature-gate enabled.
228+
229+
Currently coverages:
230+
- `k8s.io/kubernetes/pkg/apis/core/validation`:`2023-12-20` - `83.9`
231+
- `k8s.io/kubernetes/pkg/kubelet/lifecycle/handlers`:`2023-12-20` - `86.3`
232+
221233
##### Integration tests
222234
N/A
223235

@@ -252,6 +264,14 @@ N/A
252264
4. For each termination grace period value, delete the pod and observe the time it takes for the container to terminate.
253265
5. Verify that the container is terminated after the min(sleep, grace).
254266

267+
Tests List
268+
- [pod-lifecycle-sleep-action test](https://github.com/kubernetes/kubernetes/blob/a1ffdedf782edf1472102b0b99c1467d4ed39753/test/e2e/common/node/lifecycle_hook.go#L550)
269+
- [failure-links](https://storage.googleapis.com/k8s-triage/index.html?pr=1&test=PodLifecycleSleepAction)
270+
- [test-grid](https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-gce-cos-alpha-features)
271+
- [x]Basic functionality(alpha)
272+
- [x]Interaction with termination grace period(alpha)
273+
- []Sleep duration boundary testing(beta)
274+
- []Container exit/crash testing(beta)
255275
### Graduation Criteria
256276

257277
#### Alpha
@@ -297,12 +317,6 @@ If only the kubelet enable this feature, when creating/updating a resource with
297317
- [x] Feature gate (also fill in values in `kep.yaml`)
298318
- Feature gate name: PodLifecycleSleepAction
299319
- Components depending on the feature gate: kubelet,kube-apiserver
300-
- [ ] Other
301-
- Describe the mechanism:
302-
- Will enabling / disabling the feature require downtime of the control
303-
plane?
304-
- Will enabling / disabling the feature require downtime or reprovisioning
305-
of a node?
306320

307321
###### Does enabling the feature change any default behavior?
308322

@@ -320,8 +334,8 @@ New pods with sleep action in prestop hook can be created.
320334
Previously created pod with sleep hook set will execute it before terminating.
321335

322336
###### Are there any tests for feature enablement/disablement?
323-
324-
Yes. Some unit tests will be designed to test the verification process of the "sleep" field under different scenarios, such as when the feature is enabled, disabled, or switched. These tests will be included in the alpha version.
337+
For alpha, the `switch` of feature gate is tested manually.
338+
For beta, unit tests for the `switch` of feature gate itself will be added in `pkg/registry/core/pod/strategy_test`.
325339

326340
### Rollout, Upgrade and Rollback Planning
327341

@@ -331,8 +345,21 @@ The change is opt-in, it doesn't impact already running workloads.
331345

332346
###### What specific metrics should inform a rollback?
333347

348+
Metric `sleep_action_terminated_early_total` will be added in beta.
349+
If it increases unreasonably, then user should check if something goes wrong and may need a rollback.
350+
334351
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
335352

353+
This is an opt-in feature, and it does not change any default behavior. We manually tested enabling and disabling this feature by changing kubelet and kube-api-server config and restarting them.
354+
355+
The manual test steps are as following:
356+
357+
1. Create a local 1.29 k8s cluster, and create a test-pod in that cluster.
358+
2. Enable PodLifecycleSleepAction feature in kubelet and kube-api-server and restart both.
359+
3. Add a prestop hook with sleep action to the test-pod and delete it, observe the time cost.
360+
4. Create another pod with sleep action.
361+
5. Disable PodLifecycleSleepAction feature in kubelet and kube-api-server and restart both.
362+
6. Delete the pod created in step 4, and observe the time cost.
336363
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
337364

338365
No
@@ -359,10 +386,9 @@ N/A
359386

360387
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
361388

362-
- [ ] Metrics
389+
- [x] Metrics
363390
- Metric name:
364-
- [Optional] Aggregation method:
365-
- Components exposing the metric:
391+
- sleep_action_terminated_early_total(counts the number of Pods got terminated before sleep action finishes)
366392
- [x] Other (treat as last resort)
367393
- Details: Check the logs of the container during termination, check the termination duration.
368394

@@ -422,11 +448,12 @@ N/A
422448

423449
###### What steps should be taken if SLOs are not being met to determine the problem?
424450

425-
N/A
451+
Disable PodLifecycleSleepAction feature gate, and restart related components.
426452

427453
## Implementation History
428454

429455
- 2023-04-22: Initial draft KEP
456+
- 2023-12-20: Target to beta in v1.30
430457

431458
## Drawbacks
432459

keps/sig-node/3960-pod-lifecycle-sleep-action/kep.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,12 @@ see-also: []
1515
replaces: []
1616

1717
# The target maturity stage in the current dev cycle for this KEP.
18-
stage: alpha
18+
stage: beta
1919

2020
# The most recent milestone for which work toward delivery of this KEP has been
2121
# done. This can be the current (upcoming) milestone, if it is being actively
2222
# worked on.
23-
latest-milestone: "v1.29"
23+
latest-milestone: "v1.30"
2424

2525
# The milestone at which this feature was, or is targeted to be, at each stage.
2626
milestone:
@@ -38,4 +38,5 @@ feature-gates:
3838
disable-supported: true
3939

4040
# The following PRR answers are required at beta release
41-
metrics: []
41+
metrics:
42+
- "sleep_action_terminated_early_total"

0 commit comments

Comments
 (0)