Skip to content

Commit 8f8608f

Browse files
committed
node: KEP-2902: Update to the latest KEP template
Signed-off-by: Swati Sehgal <[email protected]>
1 parent eb2eda9 commit 8f8608f

File tree

1 file changed

+67
-0
lines changed
  • keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

1 file changed

+67
-0
lines changed

keps/sig-node/2902-cpumanager-distribute-cpus-policy-option/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@
1010
- [Risks and Mitigations](#risks-and-mitigations)
1111
- [Design Details](#design-details)
1212
- [Test Plan](#test-plan)
13+
- [Prerequisite testing updates](#prerequisite-testing-updates)
14+
- [Unit tests](#unit-tests)
15+
- [Integration tests](#integration-tests)
16+
- [e2e tests](#e2e-tests)
1317
- [Graduation Criteria](#graduation-criteria)
1418
- [Alpha](#alpha)
1519
- [Beta](#beta)
@@ -18,8 +22,11 @@
1822
- [Version Skew Strategy](#version-skew-strategy)
1923
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
2024
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
25+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
2126
- [Monitoring Requirements](#monitoring-requirements)
27+
- [Dependencies](#dependencies)
2228
- [Scalability](#scalability)
29+
- [Troubleshooting](#troubleshooting)
2330
- [Implementation History](#implementation-history)
2431
<!-- /toc -->
2532

@@ -116,6 +123,28 @@ NOTE: The striping operation after all CPUs have been evenly distributed will be
116123

117124
We will extend both the unit test suite and the E2E test suite to cover the new policy option described in this KEP.
118125

126+
[x] I/we understand the owners of the involved components may require updates to
127+
existing tests to make this code solid enough prior to committing the changes necessary
128+
to implement this enhancement.
129+
130+
##### Prerequisite testing updates
131+
132+
##### Unit tests
133+
134+
- `k8s.io/kubernetes/pkg/kubelet/cm/cpumanager`: `20250205` - 85.5% of statements
135+
136+
##### Integration tests
137+
138+
Not Applicable as Kubelet features don't have integration tests.
139+
140+
##### e2e tests
141+
142+
Currently no e2e tests are present for this particular policy option. E2E tests will be added as part of Beta graduation.
143+
144+
The plan is to add e2e tests to cover the basic flows for cases below:
145+
1. `distribute-cpus-across-numa` option is enabled: The test will ensure that the allocated CPUs are distributed across NUMA nodes according to the policy.
146+
1. `distribute-cpus-across-numa` option is disabled: The test will verify that the allocated CPUs are packed according to the default behavior.
147+
119148
### Graduation Criteria
120149

121150
#### Alpha
@@ -184,6 +213,25 @@ No changes. Existing container will not see their allocation changed. New contai
184213

185214
- A specific e2e test will demonstrate that the default behaviour is preserved when the feature gate is disabled, or when the feature is not used (2 separate tests)
186215

216+
### Rollout, Upgrade and Rollback Planning
217+
218+
###### How can a rollout or rollback fail? Can it impact already running workloads?
219+
220+
- A rollout or rollback can fail if the feature gate and the policy option are not configured properly and kubelet fails to start.
221+
222+
###### What specific metrics should inform a rollback?
223+
224+
Not Applicable.
225+
226+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
227+
228+
Not Applicable. This policy option only affects pods that meet certain conditions and are scheduled after the upgrade. Running pods will be unaffected
229+
by any change.
230+
231+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
232+
233+
No
234+
187235
### Monitoring Requirements
188236

189237
###### How can an operator determine if the feature is in use by workloads?
@@ -221,6 +269,12 @@ None
221269
This feature is `linux` specific, and requires a version of CRI that includes the `LinuxContainerResources.CpusetCpus` field.
222270
This has been available since `v1alpha2`.
223271

272+
### Dependencies
273+
274+
###### Does this feature depend on any specific services running in the cluster?
275+
276+
No
277+
224278
### Scalability
225279

226280
###### Will enabling / using this feature result in any new API calls?
@@ -251,10 +305,23 @@ This delay should be minimal.
251305

252306
No, the algorithm will run on a single `goroutine` with minimal memory requirements.
253307

308+
### Troubleshooting
309+
310+
###### How does this feature react if the API server and/or etcd is unavailable?
311+
312+
No impact. The behavior of the feature does not change when API Server and/or etcd is unavailable since the feature is node local.
313+
314+
###### What are other known failure modes?
315+
316+
No known failure modes.
317+
318+
###### What steps should be taken if SLOs are not being met to determine the problem?
319+
254320
## Implementation History
255321

256322
- 2021-08-26: Initial KEP created
257323
- 2021-08-30: Updates to fill out more sections, answer PRR questions
258324
- 2021-09-08: Change feature gate from `CPUManagerPolicyOptions` to `CPUManagerPolicyExperimentalOptions`
259325
- 2021-10-11: Change feature gate from `CPUManagerPolicyExperimentalOptions` to `CPUManagerPolicyAlphaOptions`
260326
- 2025-01-30: KEP update for Beta graduation of the policy option
327+
- 2025-02-05: KEP update to the latest template

0 commit comments

Comments
 (0)