Skip to content

Commit b57f64b

Browse files
committed
node: cpumgr: address review comments
Formatting fixes, clarifications (feature gate, options) Signed-off-by: Francesco Romani <[email protected]>
1 parent f79f6ef commit b57f64b

File tree

2 files changed

+32
-24
lines changed

2 files changed

+32
-24
lines changed

keps/sig-node/3570-cpumanager/README.md

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
- [Configuring the CPU Manager](#configuring-the-cpu-manager)
1919
- [Policy 1: &quot;none&quot; cpuset control [default]](#policy-1-none-cpuset-control-default)
2020
- [Policy 2: &quot;static&quot; cpuset control](#policy-2-static-cpuset-control)
21+
- [CPU Manager options](#cpu-manager-options)
2122
- [Implementation sketch](#implementation-sketch)
2223
- [Example pod specs and interpretation](#example-pod-specs-and-interpretation)
2324
- [Example scenarios and interactions](#example-scenarios-and-interactions)
@@ -70,12 +71,12 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
7071
- [X] e2e Tests for all Beta API Operations (endpoints)
7172
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
7273
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
73-
- [ ] (R) Graduation criteria is in place
74+
- [X] (R) Graduation criteria is in place
7475
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
7576
- [ ] (R) Production readiness review completed
7677
- [ ] (R) Production readiness review approved
77-
- [ ] "Implementation History" section is up-to-date for milestone
78-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
78+
- [X] "Implementation History" section is up-to-date for milestone
79+
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
7980
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
8081

8182
<!--
@@ -113,7 +114,7 @@ This KEP supersedes and replaces `kubernetes/enhancements/keps/sig-node/375-cpum
113114
throughput compared to VMs due to cpu quota being fulfilled across all
114115
cores, rather than exclusive cores, which results in fewer context
115116
switches and higher cache affinity.
116-
2. Unacceptable latency attributed to the OS process scheduler, especially
117+
1. Unacceptable latency attributed to the OS process scheduler, especially
117118
for “fast” virtual network functions (want to approach line rate on
118119
modern server NICs.)
119120

@@ -123,12 +124,12 @@ This KEP supersedes and replaces `kubernetes/enhancements/keps/sig-node/375-cpum
123124
Guaranteed pod with 1 or more cores of cpu, the system will try to make
124125
sure that the pod gets its cpu quota primarily from reserved core(s),
125126
resulting in fewer context switches and higher cache affinity".
126-
2. Support the case where in a given pod, one container is latency-critical
127+
1. Support the case where in a given pod, one container is latency-critical
127128
and another is not (e.g. auxiliary side-car containers responsible for
128129
log forwarding, metrics collection and the like.)
129-
3. Do not cap CPU quota for guaranteed containers that are granted
130+
1. Do not cap CPU quota for guaranteed containers that are granted
130131
exclusive cores, since that would be antithetical to (1) above.
131-
4. Take physical processor topology into account in the CPU affinity policy.
132+
1. Take physical processor topology into account in the CPU affinity policy.
132133

133134
### Non-Goals
134135

@@ -289,6 +290,16 @@ application-level CPU affinity of their own, as those settings may be
289290
overwritten without notice (whenever exclusive cores are
290291
allocated or deallocated.)
291292

293+
#### CPU Manager options
294+
295+
`CPUManagerPolicyOptions` allow to fine-tune the behavior of the `static` policy.
296+
The details of each option are described in their own KEP.
297+
As for kubernetes 1.26, the following options are available:
298+
299+
- [full-pcpus-only](https://github.com/fromanirh/enhancements/blob/master/keps/sig-node/2625-cpumanager-policies-thread-placement/README.md)
300+
- [distribute-cpus-across-numa](https://github.com/fromanirh/enhancements/blob/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option/README.md)
301+
- [align-by-socket](https://github.com/fromanirh/enhancements/blob/master/keps/sig-node/3327-align-by-socket/README.md)
302+
292303
##### Implementation sketch
293304

294305
The static policy maintains the following sets of logical CPUs:
@@ -429,11 +440,11 @@ extending the production code to implement this enhancement.
429440

430441
##### Integration tests
431442

432-
- TBD
443+
- N/A
433444

434445
##### e2e tests
435446

436-
- TBD
447+
- `k8s.io/kubernetes/test/e2e_node/cpu_manager_test.go`
437448

438449
### Graduation Criteria
439450

@@ -470,7 +481,6 @@ in back-to-back releases.
470481
- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)
471482
- Address feedback on usage/changed behavior, provided on GitHub issues
472483
- Deprecate the flag
473-
-->
474484

475485
### Upgrade / Downgrade Strategy
476486

@@ -486,19 +496,16 @@ Not relevant
486496

487497
###### How can this feature be enabled / disabled in a live cluster?
488498

489-
- [ ] Feature gate (also fill in values in `kep.yaml`)
490-
- Feature gate name:
491-
- Components depending on the feature gate:
492-
- [ ] Other
493-
- Describe the mechanism:
494-
- Will enabling / disabling the feature require downtime of the control
495-
plane?
496-
- Will enabling / disabling the feature require downtime or reprovisioning
497-
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
499+
- [X] Feature gate (also fill in values in `kep.yaml`)
500+
- Feature gate name: `CPUManager`
501+
- Components depending on the feature gate: kubelet
502+
503+
NOTE: in order to enable the feature, the cluster admin needs also to enable
504+
the `static` cpu manager policy.
498505

499506
###### Does enabling the feature change any default behavior?
500507

501-
No, unless the non-none policy is explicitly configured.
508+
No, unless the non-none policy (`static`) is explicitly configured.
502509

503510
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
504511

@@ -523,7 +530,7 @@ Already running workload will not be affected if the node state is steady
523530

524531
###### What specific metrics should inform a rollback?
525532

526-
Pod creation errors o a node-by-node basis.
533+
Pod creation errors on a node-by-node basis.
527534

528535
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
529536

@@ -608,7 +615,7 @@ No
608615

609616
###### How does this feature react if the API server and/or etcd is unavailable?
610617

611-
No
618+
No impact. The behavior of the feature does not change when API Server and/or etcd is unavailable since the feature is node local.
612619

613620
###### What are other known failure modes?
614621

keps/sig-node/3570-cpumanager/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ authors:
44
- "@ConnorDoyle"
55
- "@flyingcougar"
66
- "@sjenning"
7+
- "@fromanirh" # ONLY for GA graduation and PRR review
78
owning-sig: sig-node
89
participating-sigs:
910
- sig-node
@@ -12,7 +13,7 @@ reviewers:
1213
approvers:
1314
- "@dawnchen"
1415
- "@derekwaynecarr"
15-
editor: Connor Doyle, Francesco Romani (only for the GA graduation)
16+
editor: Connor Doyle
1617
creation-date: 2017-05-23
1718
last-updated: 2022-10-03
1819
status: implementable
@@ -28,7 +29,7 @@ stage: stable
2829
# The most recent milestone for which work toward delivery of this KEP has been
2930
# done. This can be the current (upcoming) milestone, if it is being actively
3031
# worked on.
31-
latest-milestone: "v1.10"
32+
latest-milestone: "v1.26"
3233

3334
# The milestone at which this feature was, or is targeted to be, at each stage.
3435
milestone:

0 commit comments

Comments
 (0)