|
10 | 10 | - [Risks and Mitigations](#risks-and-mitigations)
|
11 | 11 | - [Design Details](#design-details)
|
12 | 12 | - [Test Plan](#test-plan)
|
| 13 | + - [Prerequisite testing updates](#prerequisite-testing-updates) |
| 14 | + - [Unit tests](#unit-tests) |
| 15 | + - [Integration tests](#integration-tests) |
| 16 | + - [e2e tests](#e2e-tests) |
13 | 17 | - [Graduation Criteria](#graduation-criteria)
|
14 | 18 | - [Alpha](#alpha)
|
15 | 19 | - [Beta](#beta)
|
|
18 | 22 | - [Version Skew Strategy](#version-skew-strategy)
|
19 | 23 | - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
|
20 | 24 | - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
|
| 25 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
21 | 26 | - [Monitoring Requirements](#monitoring-requirements)
|
| 27 | + - [Dependencies](#dependencies) |
22 | 28 | - [Scalability](#scalability)
|
| 29 | + - [Troubleshooting](#troubleshooting) |
23 | 30 | - [Implementation History](#implementation-history)
|
24 | 31 | <!-- /toc -->
|
25 | 32 |
|
@@ -116,6 +123,28 @@ NOTE: The striping operation after all CPUs have been evenly distributed will be
|
116 | 123 |
|
117 | 124 | We will extend both the unit test suite and the E2E test suite to cover the new policy option described in this KEP.
|
118 | 125 |
|
| 126 | +[x] I/we understand the owners of the involved components may require updates to |
| 127 | +existing tests to make this code solid enough prior to committing the changes necessary |
| 128 | +to implement this enhancement. |
| 129 | + |
| 130 | +##### Prerequisite testing updates |
| 131 | + |
| 132 | +##### Unit tests |
| 133 | + |
| 134 | +- `k8s.io/kubernetes/pkg/kubelet/cm/cpumanager`: `20250205` - 85.5% of statements |
| 135 | + |
| 136 | +##### Integration tests |
| 137 | + |
| 138 | +Not Applicable as Kubelet features don't have integration tests. |
| 139 | + |
| 140 | +##### e2e tests |
| 141 | + |
| 142 | +Currently no e2e tests are present for this particular policy option. E2E tests will be added as part of Beta graduation. |
| 143 | + |
| 144 | +The plan is to add e2e tests to cover the basic flows for cases below: |
| 145 | +1. `distribute-cpus-across-numa` option is enabled: The test will ensure that the allocated CPUs are distributed across NUMA nodes according to the policy. |
| 146 | +1. `distribute-cpus-across-numa` option is disabled: The test will verify that the allocated CPUs are packed according to the default behavior. |
| 147 | + |
119 | 148 | ### Graduation Criteria
|
120 | 149 |
|
121 | 150 | #### Alpha
|
@@ -184,6 +213,25 @@ No changes. Existing container will not see their allocation changed. New contai
|
184 | 213 |
|
185 | 214 | - A specific e2e test will demonstrate that the default behaviour is preserved when the feature gate is disabled, or when the feature is not used (2 separate tests)
|
186 | 215 |
|
| 216 | +### Rollout, Upgrade and Rollback Planning |
| 217 | + |
| 218 | +###### How can a rollout or rollback fail? Can it impact already running workloads? |
| 219 | + |
| 220 | +- A rollout or rollback can fail if the feature gate and the policy option are not configured properly and kubelet fails to start. |
| 221 | + |
| 222 | +###### What specific metrics should inform a rollback? |
| 223 | + |
| 224 | +Not Applicable. |
| 225 | + |
| 226 | +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? |
| 227 | + |
| 228 | +Not Applicable. This policy option only affects pods that meet certain conditions and are scheduled after the upgrade. Running pods will be unaffected |
| 229 | +by any change. |
| 230 | + |
| 231 | +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? |
| 232 | + |
| 233 | +No |
| 234 | + |
187 | 235 | ### Monitoring Requirements
|
188 | 236 |
|
189 | 237 | ###### How can an operator determine if the feature is in use by workloads?
|
@@ -221,6 +269,12 @@ None
|
221 | 269 | This feature is `linux` specific, and requires a version of CRI that includes the `LinuxContainerResources.CpusetCpus` field.
|
222 | 270 | This has been available since `v1alpha2`.
|
223 | 271 |
|
| 272 | +### Dependencies |
| 273 | + |
| 274 | +###### Does this feature depend on any specific services running in the cluster? |
| 275 | + |
| 276 | +No |
| 277 | + |
224 | 278 | ### Scalability
|
225 | 279 |
|
226 | 280 | ###### Will enabling / using this feature result in any new API calls?
|
@@ -251,10 +305,23 @@ This delay should be minimal.
|
251 | 305 |
|
252 | 306 | No, the algorithm will run on a single `goroutine` with minimal memory requirements.
|
253 | 307 |
|
| 308 | +### Troubleshooting |
| 309 | + |
| 310 | +###### How does this feature react if the API server and/or etcd is unavailable? |
| 311 | + |
| 312 | +No impact. The behavior of the feature does not change when API Server and/or etcd is unavailable since the feature is node local. |
| 313 | + |
| 314 | +###### What are other known failure modes? |
| 315 | + |
| 316 | +No known failure modes. |
| 317 | + |
| 318 | +###### What steps should be taken if SLOs are not being met to determine the problem? |
| 319 | + |
254 | 320 | ## Implementation History
|
255 | 321 |
|
256 | 322 | - 2021-08-26: Initial KEP created
|
257 | 323 | - 2021-08-30: Updates to fill out more sections, answer PRR questions
|
258 | 324 | - 2021-09-08: Change feature gate from `CPUManagerPolicyOptions` to `CPUManagerPolicyExperimentalOptions`
|
259 | 325 | - 2021-10-11: Change feature gate from `CPUManagerPolicyExperimentalOptions` to `CPUManagerPolicyAlphaOptions`
|
260 | 326 | - 2025-01-30: KEP update for Beta graduation of the policy option
|
| 327 | +- 2025-02-05: KEP update to the latest template |
0 commit comments