Skip to content

Commit e673586

Browse files
authored
Merge pull request kubernetes#2933 from swatisehgal/cpumanager-policy-options-to-beta
KEP-2625: Update CPU Manager Policy Options 1.23 Beta
2 parents 252bbbb + 54538a4 commit e673586

File tree

3 files changed

+61
-30
lines changed

3 files changed

+61
-30
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2625
22
alpha:
33
approver: "@johnbelamaric"
4+
beta:
5+
approver: "@johnbelamaric"

keps/sig-node/2625-cpumanager-policies-thread-placement/README.md

Lines changed: 53 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
- [Risks and Mitigations](#risks-and-mitigations)
1515
- [Design Details](#design-details)
1616
- [Proposed Change](#proposed-change)
17-
- [Implementation strategy of reject-non-smt-aligned CPU Manager policy option](#implementation-strategy-of-reject-non-smt-aligned-cpu-manager-policy-option)
17+
- [Implementation strategy of full-pcpus-only CPU Manager policy option](#implementation-strategy-of-full-pcpus-only-cpu-manager-policy-option)
1818
- [Resource Accounting](#resource-accounting)
1919
- [Alternatives](#alternatives)
2020
- [Add extra resources](#add-extra-resources)
@@ -27,6 +27,9 @@
2727
- [Alpha](#alpha)
2828
- [Alpha to Beta Graduation](#alpha-to-beta-graduation)
2929
- [Beta to G.A Graduation](#beta-to-ga-graduation)
30+
- [Graduation Criteria of Options](#graduation-criteria-of-options)
31+
- [Graduation of Options to <code>Beta-quality</code> (non-hidden)](#graduation-of-options-to--non-hidden)
32+
- [Graduation of Options from <code>Beta-quality</code> to <code>G.A-quality</code>](#graduation-of-options-from--to-)
3033
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
3134
- [Version Skew Strategy](#version-skew-strategy)
3235
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -43,16 +46,16 @@
4346

4447
Items marked with (R) are required *prior to targeting to a milestone / release*.
4548

46-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements](https://github.com/kubernetes/enhancements/issues/2404)
47-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
48-
- [ ] (R) Design details are appropriately documented
49-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
50-
- [ ] (R) Graduation criteria is in place
51-
- [ ] (R) Production readiness review completed
52-
- [ ] Production readiness review approved
53-
- [ ] "Implementation History" section is up-to-date for milestone
49+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements](https://github.com/kubernetes/enhancements/issues/2404)
50+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
51+
- [X] (R) Design details are appropriately documented
52+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
53+
- [X] (R) Graduation criteria is in place
54+
- [X] (R) Production readiness review completed
55+
- [X] Production readiness review approved
56+
- [X] "Implementation History" section is up-to-date for milestone
5457
- ~~ [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] ~~
55-
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
58+
- [X] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5659

5760
[kubernetes.io]: https://kubernetes.io/
5861
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -114,30 +117,30 @@ The impact in the shared codebase will be addressed enhancing the current testsu
114117

115118
We propose to
116119
- add a new flag in Kubelet called `CPUManagerPolicyOptions` in the kubelet config or command line argument called `cpumanager-policy-options` which allows the user to specify the CPU Manager policy option.
117-
- add a new cpu manager option called `reject-non-smt-aligned`; if present, this option will enable further refinements of the existing static policy.
120+
- add a new cpu manager option called `full-pcpus-only`; if present, this option will enable further refinements of the existing static policy.
118121

119122
The static policy allocates CPUs using a topology-aware best-fit allocation. This enhancement wants to provide stronger guarantees by restricting the allocation of threads.
120123
The aim is to achieve the isolation for workloads managed by Kubernetes. The other part of isolation is (as of now) not managed by Kubernetes, as described in [Explicitly Reserved CPU List](https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list) and [Static policy](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy).
121124

122-
Let's summarize the key properties of the `reject-non-smt-aligned` option:
125+
Let's summarize the key properties of the `full-pcpus-only` option:
123126
- Preserve all the properties of the `static` policy.
124127
- Never allocate less than a physical-cpu worth amount of cores.
125128
- With this requirement enforced, the CPUManager allocation algorithm will guarantee avoidance of physical core sharing.
126129
- Should the node not have enough free physical cores, the Pod will be put in Failed state, with `SMTAlignmentError` as reason.
127130

128-
### Implementation strategy of reject-non-smt-aligned CPU Manager policy option
131+
### Implementation strategy of full-pcpus-only CPU Manager policy option
129132

130-
- In order to introduce the SMT-alignment check in CPU Manager, we introduce a new flag in Kubelet to allow the user to specify `cpumanager-policy-options` which when specified with `reject-non-smt-aligned` as its value provides the capability to modify the behaviour of static policy to strictly guarantee allocation of whole cores to a workload.
133+
- In order to introduce the SMT-alignment check in CPU Manager, we introduce a new flag in Kubelet to allow the user to specify `cpumanager-policy-options` which when specified with `full-pcpus-only` as its value provides the capability to modify the behaviour of static policy to strictly guarantee allocation of whole cores to a workload.
131134
- The `CPUManagerPolicyOptions` received from the kubelet config/command line args is propogated to the Container Manager.
132135
- The responsibility of admission control is centralized in containermanager. The resource managers and/or the resource allocation orchestrator (Topology Manager) still have the responsibility of running the checks to admit the pods, but the handling of these errors and the building of the pod lifecycle result are now factored in containermanager.
133136
- Prior to this feature, the Container Manager admission handler was delegated to the topology manager if the latter was enabled. This worked well under the assumption that only Topology Manager had the ability to reject admissions with pods. But with the introduction of this feature, the CPU Manager also needs the ability to possibly reject pods if strict SMT alignment is requested. In order to do so, we introduce a new error and let it drive the rejection. Due to an already existing dependency between CPUManager and TopologyManager as the former imports the latter in order to support the `topologymanager.HintProvider` interface, container manager is considered as the appropriate for performing admission control.
134-
- When `reject-non-smt-aligned` policy option is specified along with `static` CPU Manager policy, an additional check in the allocation logic of the `static` policy ensures that CPUs would be allocated such that full cores are allocated. Because of this check, a pod would never have to acquire single threads with the aim to fill partially-allocated cores.
137+
- When `full-pcpus-only` policy option is specified along with `static` CPU Manager policy, an additional check in the allocation logic of the `static` policy ensures that CPUs would be allocated such that full cores are allocated. Because of this check, a pod would never have to acquire single threads with the aim to fill partially-allocated cores.
135138
- In case request translates to partial occupancy of the cores, the Pod will not be admitted and would fail with `SMTAlignmentError`.
136139

137140

138141
### Resource Accounting
139142

140-
To illustrate the behaviour of the `reject-non-smt-aligned` policy option, we will consider the following CPU topology. We will use as example a CPU package with 16 physical cores, 2-way SMT-capable.
143+
To illustrate the behaviour of the `full-pcpus-only` policy option, we will consider the following CPU topology. We will use as example a CPU package with 16 physical cores, 2-way SMT-capable.
141144

142145
![Example Topology](smtalign-topology.png)
143146

@@ -162,11 +165,11 @@ spec:
162165
cpu: "5"
163166
```
164167
165-
The `reject-non-smt-aligned` policy option will cause the pod to be rejected since it doesn't request enough cores to consume all virtual threads exposed by the CPU.
168+
The `full-pcpus-only` policy option will cause the pod to be rejected since it doesn't request enough cores to consume all virtual threads exposed by the CPU.
166169

167170
would need to make sure the remaining core on the half-allocated physical CPU is left unallocated to avoid noisy neighbours.
168171

169-
![Example core allocation with the reject-non-smt-aligned policy option when requesting a odd number of cores](smtalign-allocation-odd-cores.png)
172+
![Example core allocation with the full-pcpus-only policy option when requesting a odd number of cores](smtalign-allocation-odd-cores.png)
170173

171174
The container will then actually get more virtual cores (6) than what is requesting (5).
172175

@@ -250,7 +253,7 @@ We would like to mention a further extension of this work, which we are *not* pr
250253
A further subset of the latency sensitive class of workload we identified (CNF, HFT) benefits most of non-SMT system, delivering the best possible performance here.
251254
For these applications, just disabling SMT at machine level solves the need of the workload, but overall creates worse usage of hardware resources and poorer container density.
252255

253-
Another policy option, or a further refinement of `reject-non-smt-aligned`, which enables non-SMT emulation on SMT-enabled system would allow to accommodate these needs, but this would cause even more significant resource accounting mismatches
256+
Another policy option, or a further refinement of `full-pcpus-only`, which enables non-SMT emulation on SMT-enabled system would allow to accommodate these needs, but this would cause even more significant resource accounting mismatches
254257
as described above. Furthermore, at the moment of writing we are still assessing how large is the set of the classes which benefit of these extra guarantees.
255258

256259
For all these reasons we postponed this work to a later date.
@@ -268,11 +271,30 @@ The [implementation PR](https://github.com/kubernetes/kubernetes/pull/101432) wi
268271
#### Alpha to Beta Graduation
269272
- [X] Gather feedback from the consumer of the policy.
270273
- [X] No major bugs reported in the previous cycle.
274+
- [X] Use of this policy option to further configure the behavior of CPU manager. Another CPUManager policy option `distribute-cpus-across-numa` is being proposed in 1.23 release to distribute CPUs across NUMA nodes instead of packing them.
271275

272276
#### Beta to G.A Graduation
273277
- [X] Allowing time for feedback (1 year).
274278
- [X] Risks have been addressed.
275279

280+
### Graduation Criteria of Options
281+
282+
In 1.23 release, as we are graduating this feature to Beta meaning `CPUManagerPolicyOptions` is enabled by default allowing the user to configure CPU Manager static policy with the option `full-pcpus-only`.
283+
NOTE: Even though the feature gate is enabled by default the user still has to explicitly use the Kubelet flag called `CPUManagerPolicyOptions` in the kubelet config or command line argument called `cpumanager-policy-options` along with a specific policy option to use this feature.
284+
- In addition to this, in order to not have all alpha-quality experimental options introduced in the future available by default, we are introducing an additional feature gate called `CPUManagerPolicyExperimentalOptions` that controls all the experimental options. The experimental options are hidden by default and only if the feature gate is enabled the user has the ability to use the experimental options. Based on the graduation criteria described below, a policy option can move from being hidden to being non-hidden. Once the feature is non-hidden the user would not need to use `CPUManagerPolicyExperimentalOptions` feature gate in order to use that option.
285+
- Since the feature that allows the ability to customize the behaviour of CPUManager static policy as well as the CPUManager Policy option `full-pcpus-only` were both introduced in 1.22 release and meet the above graduation criterion, `full-pcpus-only` would be considered as a non-hidden option i.e. available to be used when explicitly used along with `CPUManagerPolicyOptions` Kubelet flag in the kubelet config or command line argument called `cpumanager-policy-options` .
286+
- The introduction of this new feature gate gives us the ability to move the feature to beta and later stable without implying all that the options are beta or stable.
287+
288+
The graduation Criteria of options is described below:
289+
290+
#### Graduation of Options to `Beta-quality` (non-hidden)
291+
- [X] Gather feedback from the consumer of the policy option.
292+
- [X] No major bugs reported in the previous cycle.
293+
294+
#### Graduation of Options from `Beta-quality` to `G.A-quality`
295+
- [X] Allowing time for feedback (1 year) on the policy option.
296+
- [X] Risks have been addressed.
297+
276298
### Upgrade / Downgrade Strategy
277299

278300
We expect no impact. The new policies are opt-in and separated by the existing ones.
@@ -287,17 +309,19 @@ No changes needed
287309
* **How can this feature be enabled / disabled in a live cluster?**
288310
- [X] Feature gate (also fill in values in `kep.yaml`).
289311
- Feature gate name: `CPUManagerPolicyOptions`.
312+
- Feature gate name: `CPUManagerPolicyExperimentalOptions`.
290313
- Components depending on the feature gate: kubelet
291-
- [X] Change the kubelet configuration to set the CPUManager policy to `configurable`
292-
- [X] Change the kubelet configuration adding the CPUManager policy option to `reject-non-smt-aligned`
314+
- [X] Change the kubelet configuration adding the CPUManager policy option to `full-pcpus-only`
293315
* **Does enabling the feature change any default behavior?**
294-
- Yes, it makes the behaviour of the CPUManager static policy more restrictive and can lead to pod admission rejection.
316+
- Enabling the `CPUManagerPolicyOptions` makes the behaviour of the CPUManager static policy more restrictive and can lead to pod admission rejection.
317+
- Enabling the `CPUManagerPolicyExperimentalOptions` provides the ability to use experimental options which depending on the option can change the behaviour of the CPUManager static policy.
295318
* **Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)?**
296-
- Yes, disabling the feature gate shuts down the feature completely; alternatively,
297-
- Yes, through kubelet configuration - switch to a different policy.
319+
- Yes, disabling the `CPUManagerPolicyOptions` feature gate shuts down the feature completely; alternatively through kubelet configuration - switch to a different policy.
320+
- Also, disabling `CPUManagerPolicyExperimentalOptions` feature gate disables the use of experimental options and the behavior would depend on how `CPUManagerPolicyOptions` feature gate is configured.
321+
- Disabling both the feature gates would allow complete rollback of this enablement.
298322
* **What happens if we reenable the feature if it was previously rolled back?** No changes. Existing container will not see their allocation changed. New containers will.
299323
* **Are there any tests for feature enablement/disablement?**
300-
- A specific e2e test will demonstrate that the default behaviour is preserved when the feature gate is disabled, or when the feature is not used (2 separate tests)
324+
- A specific e2e test will demonstrate that the default behaviour is preserved when the `CPUManagerPolicyOptions` feature gate is disabled, or when the feature is not used (2 separate tests)
301325

302326
### Rollout, Upgrade and Rollback Planning
303327

@@ -309,7 +333,7 @@ No changes needed
309333

310334
### Monitoring requirements
311335
* **How can an operator determine if the feature is in use by workloads?**
312-
- Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option
336+
- Inspect the kubelet configuration of the nodes: check feature gates and usage of the new options
313337
* **What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
314338
- No change
315339
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** N/A.
@@ -332,7 +356,7 @@ No changes needed
332356
### Troubleshooting
333357

334358
* **How does this feature react if the API server and/or etcd is unavailable?**: No effect.
335-
* **What are other known failure modes?** TBD
359+
* **What are other known failure modes?** No known failure mode.
336360
* **What steps should be taken if SLOs are not being met to determine the problem?** N/A
337361

338362
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
@@ -349,3 +373,5 @@ No changes needed
349373
- 2021-05-10: KEP update to add to rename the `smtalign` to `reject-non-smt-aligned` for better clarity and address review comments
350374
- 2021-05-11: KEP update to add to the `configurable` alias and address review comments
351375
- 2021-05-13: KEP update to postpone the `configurable` alias, per review comments
376+
- 2021-09-02: KEP update to capture the policy name `full-pcpus-only` based on the implementation merged in 1.22, explain how this feature is being used for introduction of another policy option and updates pertaining to promotion of the feature to Beta.
377+
- 2021-09-08: KEP update to introduce `CPUManagerPolicyExperimentalOptions` feature gate to prevent alpha-quality options from being non-hidden (available) by default and explain the graduation criteria of the options.

keps/sig-node/2625-cpumanager-policies-thread-placement/kep.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,24 @@ owning-sig: sig-node
77
participating-sigs: []
88
status: implementable
99
creation-date: "2021-04-14"
10+
last-updated: "2021-09-08"
1011
reviewers:
1112
- "@klueska"
1213
approvers:
1314
- "@sig-node-leads"
1415
prr-approvers:
1516
- "@johnbelamaric"
16-
see-also: []
17+
see-also:
18+
- "keps/sig-node/2902-cpumanager-distribute-cpus-policy-option/"
1719
replaces: []
1820

1921
# The target maturity stage in the current dev cycle for this KEP.
20-
stage: alpha
22+
stage: beta
2123

2224
# The most recent milestone for which work toward delivery of this KEP has been
2325
# done. This can be the current (upcoming) milestone, if it is being actively
2426
# worked on.
25-
latest-milestone: "v1.22"
27+
latest-milestone: "v1.23"
2628

2729
# The milestone at which this feature was, or is targeted to be, at each stage.
2830
milestone:
@@ -34,6 +36,7 @@ milestone:
3436
# List the feature gate name and the components for which it must be enabled
3537
feature-gates:
3638
- name: "CPUManagerPolicyOptions"
39+
- name: "CPUManagerPolicyExperimentalOptions"
3740
components:
3841
- kubelet
3942
disable-supported: true

0 commit comments

Comments
 (0)