You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/2625-cpumanager-policies-thread-placement/README.md
+53-27Lines changed: 53 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
-[Risks and Mitigations](#risks-and-mitigations)
15
15
-[Design Details](#design-details)
16
16
-[Proposed Change](#proposed-change)
17
-
-[Implementation strategy of reject-non-smt-aligned CPU Manager policy option](#implementation-strategy-of-reject-non-smt-aligned-cpu-manager-policy-option)
17
+
-[Implementation strategy of full-pcpus-only CPU Manager policy option](#implementation-strategy-of-full-pcpus-only-cpu-manager-policy-option)
18
18
-[Resource Accounting](#resource-accounting)
19
19
-[Alternatives](#alternatives)
20
20
-[Add extra resources](#add-extra-resources)
@@ -27,6 +27,9 @@
27
27
-[Alpha](#alpha)
28
28
-[Alpha to Beta Graduation](#alpha-to-beta-graduation)
29
29
-[Beta to G.A Graduation](#beta-to-ga-graduation)
30
+
-[Graduation Criteria of Options](#graduation-criteria-of-options)
31
+
-[Graduation of Options to <code>Beta-quality</code> (non-hidden)](#graduation-of-options-to--non-hidden)
32
+
-[Graduation of Options from <code>Beta-quality</code> to <code>G.A-quality</code>](#graduation-of-options-from--to-)
Items marked with (R) are required *prior to targeting to a milestone / release*.
45
48
46
-
-[] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements](https://github.com/kubernetes/enhancements/issues/2404)
47
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
48
-
-[] (R) Design details are appropriately documented
49
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
50
-
-[] (R) Graduation criteria is in place
51
-
-[] (R) Production readiness review completed
52
-
-[] Production readiness review approved
53
-
-[] "Implementation History" section is up-to-date for milestone
49
+
-[X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements](https://github.com/kubernetes/enhancements/issues/2404)
50
+
-[X] (R) KEP approvers have approved the KEP status as `implementable`
51
+
-[X] (R) Design details are appropriately documented
52
+
-[X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
53
+
-[X] (R) Graduation criteria is in place
54
+
-[X] (R) Production readiness review completed
55
+
-[X] Production readiness review approved
56
+
-[X] "Implementation History" section is up-to-date for milestone
54
57
- ~~ [] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] ~~
55
-
-[] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
58
+
-[X] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -114,30 +117,30 @@ The impact in the shared codebase will be addressed enhancing the current testsu
114
117
115
118
We propose to
116
119
- add a new flag in Kubelet called `CPUManagerPolicyOptions` in the kubelet config or command line argument called `cpumanager-policy-options` which allows the user to specify the CPU Manager policy option.
117
-
- add a new cpu manager option called `reject-non-smt-aligned`; if present, this option will enable further refinements of the existing static policy.
120
+
- add a new cpu manager option called `full-pcpus-only`; if present, this option will enable further refinements of the existing static policy.
118
121
119
122
The static policy allocates CPUs using a topology-aware best-fit allocation. This enhancement wants to provide stronger guarantees by restricting the allocation of threads.
120
123
The aim is to achieve the isolation for workloads managed by Kubernetes. The other part of isolation is (as of now) not managed by Kubernetes, as described in [Explicitly Reserved CPU List](https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list) and [Static policy](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy).
121
124
122
-
Let's summarize the key properties of the `reject-non-smt-aligned` option:
125
+
Let's summarize the key properties of the `full-pcpus-only` option:
123
126
- Preserve all the properties of the `static` policy.
124
127
- Never allocate less than a physical-cpu worth amount of cores.
125
128
- With this requirement enforced, the CPUManager allocation algorithm will guarantee avoidance of physical core sharing.
126
129
- Should the node not have enough free physical cores, the Pod will be put in Failed state, with `SMTAlignmentError` as reason.
127
130
128
-
### Implementation strategy of reject-non-smt-aligned CPU Manager policy option
131
+
### Implementation strategy of full-pcpus-only CPU Manager policy option
129
132
130
-
- In order to introduce the SMT-alignment check in CPU Manager, we introduce a new flag in Kubelet to allow the user to specify `cpumanager-policy-options` which when specified with `reject-non-smt-aligned` as its value provides the capability to modify the behaviour of static policy to strictly guarantee allocation of whole cores to a workload.
133
+
- In order to introduce the SMT-alignment check in CPU Manager, we introduce a new flag in Kubelet to allow the user to specify `cpumanager-policy-options` which when specified with `full-pcpus-only` as its value provides the capability to modify the behaviour of static policy to strictly guarantee allocation of whole cores to a workload.
131
134
- The `CPUManagerPolicyOptions` received from the kubelet config/command line args is propogated to the Container Manager.
132
135
- The responsibility of admission control is centralized in containermanager. The resource managers and/or the resource allocation orchestrator (Topology Manager) still have the responsibility of running the checks to admit the pods, but the handling of these errors and the building of the pod lifecycle result are now factored in containermanager.
133
136
- Prior to this feature, the Container Manager admission handler was delegated to the topology manager if the latter was enabled. This worked well under the assumption that only Topology Manager had the ability to reject admissions with pods. But with the introduction of this feature, the CPU Manager also needs the ability to possibly reject pods if strict SMT alignment is requested. In order to do so, we introduce a new error and let it drive the rejection. Due to an already existing dependency between CPUManager and TopologyManager as the former imports the latter in order to support the `topologymanager.HintProvider` interface, container manager is considered as the appropriate for performing admission control.
134
-
- When `reject-non-smt-aligned` policy option is specified along with `static` CPU Manager policy, an additional check in the allocation logic of the `static` policy ensures that CPUs would be allocated such that full cores are allocated. Because of this check, a pod would never have to acquire single threads with the aim to fill partially-allocated cores.
137
+
- When `full-pcpus-only` policy option is specified along with `static` CPU Manager policy, an additional check in the allocation logic of the `static` policy ensures that CPUs would be allocated such that full cores are allocated. Because of this check, a pod would never have to acquire single threads with the aim to fill partially-allocated cores.
135
138
- In case request translates to partial occupancy of the cores, the Pod will not be admitted and would fail with `SMTAlignmentError`.
136
139
137
140
138
141
### Resource Accounting
139
142
140
-
To illustrate the behaviour of the `reject-non-smt-aligned` policy option, we will consider the following CPU topology. We will use as example a CPU package with 16 physical cores, 2-way SMT-capable.
143
+
To illustrate the behaviour of the `full-pcpus-only` policy option, we will consider the following CPU topology. We will use as example a CPU package with 16 physical cores, 2-way SMT-capable.
141
144
142
145

143
146
@@ -162,11 +165,11 @@ spec:
162
165
cpu: "5"
163
166
```
164
167
165
-
The `reject-non-smt-aligned` policy option will cause the pod to be rejected since it doesn't request enough cores to consume all virtual threads exposed by the CPU.
168
+
The `full-pcpus-only` policy option will cause the pod to be rejected since it doesn't request enough cores to consume all virtual threads exposed by the CPU.
166
169
167
170
would need to make sure the remaining core on the half-allocated physical CPU is left unallocated to avoid noisy neighbours.
168
171
169
-

172
+

170
173
171
174
The container will then actually get more virtual cores (6) than what is requesting (5).
172
175
@@ -250,7 +253,7 @@ We would like to mention a further extension of this work, which we are *not* pr
250
253
A further subset of the latency sensitive class of workload we identified (CNF, HFT) benefits most of non-SMT system, delivering the best possible performance here.
251
254
For these applications, just disabling SMT at machine level solves the need of the workload, but overall creates worse usage of hardware resources and poorer container density.
252
255
253
-
Another policy option, or a further refinement of `reject-non-smt-aligned`, which enables non-SMT emulation on SMT-enabled system would allow to accommodate these needs, but this would cause even more significant resource accounting mismatches
256
+
Another policy option, or a further refinement of `full-pcpus-only`, which enables non-SMT emulation on SMT-enabled system would allow to accommodate these needs, but this would cause even more significant resource accounting mismatches
254
257
as described above. Furthermore, at the moment of writing we are still assessing how large is the set of the classes which benefit of these extra guarantees.
255
258
256
259
For all these reasons we postponed this work to a later date.
@@ -268,11 +271,30 @@ The [implementation PR](https://github.com/kubernetes/kubernetes/pull/101432) wi
268
271
#### Alpha to Beta Graduation
269
272
- [X] Gather feedback from the consumer of the policy.
270
273
- [X] No major bugs reported in the previous cycle.
274
+
- [X] Use of this policy option to further configure the behavior of CPU manager. Another CPUManager policy option `distribute-cpus-across-numa` is being proposed in 1.23 release to distribute CPUs across NUMA nodes instead of packing them.
271
275
272
276
#### Beta to G.A Graduation
273
277
- [X] Allowing time for feedback (1 year).
274
278
- [X] Risks have been addressed.
275
279
280
+
### Graduation Criteria of Options
281
+
282
+
In 1.23 release, as we are graduating this feature to Beta meaning `CPUManagerPolicyOptions` is enabled by default allowing the user to configure CPU Manager static policy with the option `full-pcpus-only`.
283
+
NOTE: Even though the feature gate is enabled by default the user still has to explicitly use the Kubelet flag called `CPUManagerPolicyOptions` in the kubelet config or command line argument called `cpumanager-policy-options` along with a specific policy option to use this feature.
284
+
- In addition to this, in order to not have all alpha-quality experimental options introduced in the future available by default, we are introducing an additional feature gate called `CPUManagerPolicyExperimentalOptions` that controls all the experimental options. The experimental options are hidden by default and only if the feature gate is enabled the user has the ability to use the experimental options. Based on the graduation criteria described below, a policy option can move from being hidden to being non-hidden. Once the feature is non-hidden the user would not need to use `CPUManagerPolicyExperimentalOptions` feature gate in order to use that option.
285
+
- Since the feature that allows the ability to customize the behaviour of CPUManager static policy as well as the CPUManager Policy option `full-pcpus-only` were both introduced in 1.22 release and meet the above graduation criterion, `full-pcpus-only` would be considered as a non-hidden option i.e. available to be used when explicitly used along with `CPUManagerPolicyOptions` Kubelet flag in the kubelet config or command line argument called `cpumanager-policy-options` .
286
+
- The introduction of this new feature gate gives us the ability to move the feature to beta and later stable without implying all that the options are beta or stable.
287
+
288
+
The graduation Criteria of options is described below:
289
+
290
+
#### Graduation of Options to `Beta-quality` (non-hidden)
291
+
- [X] Gather feedback from the consumer of the policy option.
292
+
- [X] No major bugs reported in the previous cycle.
293
+
294
+
#### Graduation of Options from `Beta-quality` to `G.A-quality`
295
+
- [X] Allowing time for feedback (1 year) on the policy option.
296
+
- [X] Risks have been addressed.
297
+
276
298
### Upgrade / Downgrade Strategy
277
299
278
300
We expect no impact. The new policies are opt-in and separated by the existing ones.
@@ -287,17 +309,19 @@ No changes needed
287
309
* **How can this feature be enabled / disabled in a live cluster?**
288
310
- [X] Feature gate (also fill in values in `kep.yaml`).
- Components depending on the feature gate: kubelet
291
-
- [X] Change the kubelet configuration to set the CPUManager policy to `configurable`
292
-
- [X] Change the kubelet configuration adding the CPUManager policy option to `reject-non-smt-aligned`
314
+
- [X] Change the kubelet configuration adding the CPUManager policy option to `full-pcpus-only`
293
315
* **Does enabling the feature change any default behavior?**
294
-
- Yes, it makes the behaviour of the CPUManager static policy more restrictive and can lead to pod admission rejection.
316
+
- Enabling the `CPUManagerPolicyOptions` makes the behaviour of the CPUManager static policy more restrictive and can lead to pod admission rejection.
317
+
- Enabling the `CPUManagerPolicyExperimentalOptions` provides the ability to use experimental options which depending on the option can change the behaviour of the CPUManager static policy.
295
318
* **Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)?**
296
-
- Yes, disabling the feature gate shuts down the feature completely; alternatively,
297
-
- Yes, through kubelet configuration - switch to a different policy.
319
+
- Yes, disabling the `CPUManagerPolicyOptions` feature gate shuts down the feature completely; alternatively through kubelet configuration - switch to a different policy.
320
+
- Also, disabling `CPUManagerPolicyExperimentalOptions` feature gate disables the use of experimental options and the behavior would depend on how `CPUManagerPolicyOptions` feature gate is configured.
321
+
- Disabling both the feature gates would allow complete rollback of this enablement.
298
322
* **What happens if we reenable the feature if it was previously rolled back?** No changes. Existing container will not see their allocation changed. New containers will.
299
323
* **Are there any tests for feature enablement/disablement?**
300
-
- A specific e2e test will demonstrate that the default behaviour is preserved when the feature gate is disabled, or when the feature is not used (2 separate tests)
324
+
- A specific e2e test will demonstrate that the default behaviour is preserved when the `CPUManagerPolicyOptions` feature gate is disabled, or when the feature is not used (2 separate tests)
301
325
302
326
### Rollout, Upgrade and Rollback Planning
303
327
@@ -309,7 +333,7 @@ No changes needed
309
333
310
334
### Monitoring requirements
311
335
* **How can an operator determine if the feature is in use by workloads?**
312
-
- Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option
336
+
- Inspect the kubelet configuration of the nodes: check feature gates and usage of the new options
313
337
* **What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
314
338
- No change
315
339
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** N/A.
@@ -332,7 +356,7 @@ No changes needed
332
356
### Troubleshooting
333
357
334
358
* **How does this feature react if the API server and/or etcd is unavailable?**: No effect.
335
-
* **What are other known failure modes?** TBD
359
+
* **What are other known failure modes?** No known failure mode.
336
360
* **What steps should be taken if SLOs are not being met to determine the problem?** N/A
- 2021-05-10: KEP update to add to rename the `smtalign` to `reject-non-smt-aligned` for better clarity and address review comments
350
374
- 2021-05-11: KEP update to add to the `configurable` alias and address review comments
351
375
- 2021-05-13: KEP update to postpone the `configurable` alias, per review comments
376
+
- 2021-09-02: KEP update to capture the policy name `full-pcpus-only` based on the implementation merged in 1.22, explain how this feature is being used for introduction of another policy option and updates pertaining to promotion of the feature to Beta.
377
+
- 2021-09-08: KEP update to introduce `CPUManagerPolicyExperimentalOptions` feature gate to prevent alpha-quality options from being non-hidden (available) by default and explain the graduation criteria of the options.
0 commit comments