Skip to content

Commit 0a85b19

Browse files
committed
KEP-3386: Graduate Evented PLEG to Beta
Signed-off-by: Harshal Patil <[email protected]>
1 parent 2d3d1d7 commit 0a85b19

File tree

3 files changed

+45
-9
lines changed

3 files changed

+45
-9
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3386
22
alpha:
33
approver: "@deads2k"
4+
beta:
5+
approver: "@deads2k"

keps/sig-node/3386-kubelet-evented-pleg/README.md

Lines changed: 40 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
- [e2e tests](#e2e-tests)
2424
- [Graduation Criteria](#graduation-criteria)
2525
- [Alpha](#alpha)
26+
- [Beta](#beta)
2627
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
2728
- [Version Skew Strategy](#version-skew-strategy)
2829
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -42,9 +43,9 @@
4243

4344
Items marked with (R) are required *prior to targeting to a milestone / release*.
4445

45-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
46-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
47-
- [ ] (R) Design details are appropriately documented
46+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
47+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
48+
- [x] (R) Design details are appropriately documented
4849
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
4950
- [ ] e2e Tests for all Beta API Operations (endpoints)
5051
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
@@ -53,7 +54,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
5354
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
5455
- [ ] (R) Production readiness review completed
5556
- [ ] (R) Production readiness review approved
56-
- [ ] "Implementation History" section is up-to-date for milestone
57+
- [x] "Implementation History" section is up-to-date for milestone
5758
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
5859
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5960

@@ -333,6 +334,8 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
333334
-->
334335

335336
- Existing Pod Lifecycle tests must pass fine even after increasing the relisting frequency.
337+
- E2E Node Conformance non-blocking [presubmit job](https://testgrid.k8s.io/sig-node-presubmits#pr-crio-cgrpv1-evented-pleg-gce-e2e)
338+
- E2E Node Conformance non-blocking [periodic job](https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-evented-pleg)
336339

337340

338341
### Graduation Criteria
@@ -341,6 +344,10 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
341344
- Feature implemented behind a feature flag
342345
- Existing `node e2e` tests around pod lifecycle must pass
343346

347+
#### Beta
348+
- Add E2E Node Conformance presubmit job in CI
349+
- Add E2E Node Conformance periodic job in CI
350+
344351
### Upgrade / Downgrade Strategy
345352

346353
N/A
@@ -379,8 +386,7 @@ If reenabled, kubelet will again start updating container statuses using CRI eve
379386

380387
###### Are there any tests for feature enablement/disablement?
381388

382-
Yes, unit tests for the feature when enabled and disabled will be implemented in both kubelet
383-
389+
These [unit test](https://github.com/kubernetes/kubernetes/blob/ca70940ba8c375bc69091822a9d52bcb7925de3b/pkg/kubelet/pleg/evented_test.go#L47) performs a health check on Evented PLEG.
384390
### Rollout, Upgrade and Rollback Planning
385391

386392
<!--
@@ -409,14 +415,35 @@ that might indicate a serious problem?
409415
-->
410416

411417
If users observe incosistancy in the container statuses reported by the kubelet and the CRI runtime (e.g. using a tool like `crictl`) after enabling this feature, they should consider rolling back the feature.
418+
419+
Apart from that cluster admins can monitor the state of evented PLEG's connection with the CRI runtime using following metrics,
420+
421+
* `evented_pleg_connection_error_count` - The count of errors encountered during the establishment of streaming connection with the CRI runtime.
422+
* `evented_pleg_connection_success_count` - The count of successful streaming connections with the CRI runtime.
423+
* `evented_pleg_connection_latency_seconds` - The latency of streaming connection with the CRI runtime, measured in seconds.
424+
* `evented_pleg_notifications_received` - The number of notifications received through streaming connection with the CRI runtime.
425+
412426
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
413427

414428
<!--
415429
Describe manual testing that was done and the outcomes.
416430
Longer term, we may want to require automated upgrade/rollback tests, but we
417431
are missing a bunch of machinery and tooling and can't do that now.
418432
-->
419-
N/A for alpha release. But we will add the tests for beta release.
433+
434+
Following scenarios were tested in manual tests,
435+
436+
Scenario 1: Kubelet Upgrade without Corresponding CRI Runtime Upgrade
437+
438+
Step 1: Kubelet is upgraded but CRI runtime remains unchanged. Kubelet falls back to using the Generic PLEG as the CRI runtime does not emit any CRI events.
439+
Step 2: Kubelet is downgraded, but the CRI runtime version remains the same. Kubelet continues to work with the existing Generic PLEG.
440+
Step 3: If the Kubelet is upgraded again, it behaves similarly to step 1.
441+
442+
Scenario 2: Kubelet and CRI Runtime Upgrade Together
443+
444+
Step 1: Both the Kubelet and CRI runtime are upgraded. Since the CRI runtime emits CRI events, Kubelet uses the Evented PLEG with an increased relisting period for the Generic PLEG.
445+
Step 2: Kubelet and CRI runtime are downgraded. Kubelet defaults to using the Generic PLEG.
446+
Step 3: If the Kubelet is upgraded again, it behaves similarly to Scenario 1, Step 1.
420447
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
421448

422449
<!--
@@ -564,6 +591,10 @@ No.
564591

565592
No.
566593

594+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
595+
596+
No.
597+
567598
### Troubleshooting
568599

569600
###### How does this feature react if the API server and/or etcd is unavailable?
@@ -589,6 +620,8 @@ Disabling this feature in the kubelet will revert to the existing relisting PLEG
589620
## Implementation History
590621

591622
- PR for required CRI changes - https://github.com/kubernetes/kubernetes/pull/110165
623+
- PR for presubmit Node e2e job - https://github.com/kubernetes/test-infra/pull/28366
624+
- PR for periodic Node e2e job - https://github.com/kubernetes/test-infra/pull/28592
592625

593626
## Drawbacks
594627

keps/sig-node/3386-kubelet-evented-pleg/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,17 @@ approvers:
1313
- "@derekwaynecarr"
1414

1515
# The target maturity stage in the current dev cycle for this KEP.
16-
stage: alpha
16+
stage: beta
1717

1818
# The most recent milestone for which work toward delivery of this KEP has been
1919
# done. This can be the current (upcoming) milestone, if it is being actively
2020
# worked on.
21-
latest-milestone: "v1.26"
21+
latest-milestone: "v1.27"
2222

2323
# The milestone at which this feature was, or is targeted to be, at each stage.
2424
milestone:
2525
alpha: "v1.26"
26+
beta: "v1.27"
2627

2728
# The following PRR answers are required at alpha release
2829
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)