Skip to content

Commit 9ed4fd2

Browse files
committed
node: cpumgr: address the pending questions
Address the questionnaire required for GA graduation. Signed-off-by: Francesco Romani <[email protected]>
1 parent ee7f329 commit 9ed4fd2

File tree

1 file changed

+73
-38
lines changed

1 file changed

+73
-38
lines changed

keps/sig-node/375-cpu-manager/README.md

Lines changed: 73 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
- [Non-Goals](#non-goals)
99
- [Proposal](#proposal)
1010
- [User Stories (Optional)](#user-stories-optional)
11-
- [Story 1](#story-1)
12-
- [Story 2](#story-2)
11+
- [Story 1 : High-performance applications](#story-1--high-performance-applications)
12+
- [Story 2 : KubeVirt](#story-2--kubevirt)
1313
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
1414
- [Risks and Mitigations](#risks-and-mitigations)
1515
- [Design Details](#design-details)
@@ -131,7 +131,7 @@ reconciliation loop.
131131

132132
### Non-Goals
133133

134-
TBD
134+
N/A
135135

136136
## Proposal
137137

@@ -145,15 +145,20 @@ observability and checkpointing extensions._
145145

146146
### User Stories (Optional)
147147

148-
TBD
148+
#### Story 1 : High-performance applications
149+
150+
Systems such as real-time trading system or 5G CNFs (User Plane Function, UPF) need to maximize the CPU time; CPU pinning ensure exclusive CPU allocation and allows to avoid performance issues due to core switches, cold caches.
151+
NUMA aware allocation of CPUs, provided by CPU manager cooperating with Topology Manager, is also a critical prerequisite for these applications to meet their performance requirement.
152+
The alignment of resources on the same NUMA node, CPUs first and foremost, prevents performance degradation due to inter-node (between NUMA nodes) communication overhead.
149153

150-
#### Story 1
154+
#### Story 2 : KubeVirt
151155

152-
#### Story 2
156+
KubeVirt leverages the CPU pinning provided by CPU manager to assign full CPU cores to vCPUs inside the VM to [enhance performance][kubevirt-cpus].
157+
[NUMA support for VMs][kubevirt-numa] is also built on top of the CPU pinning and NUMA-aware CPU allocation.
153158

154159
### Notes/Constraints/Caveats (Optional)
155160

156-
TBD
161+
N/A
157162

158163
### Risks and Mitigations
159164

@@ -399,19 +404,35 @@ to implement this enhancement.
399404

400405
##### Prerequisite testing updates
401406

402-
TBD
403-
404407
##### Unit tests
408+
<!--
409+
In principle every added code should have complete unit test coverage, so providing
410+
the exact set of tests will not bring additional value.
411+
However, if complete unit test coverage is not possible, explain the reason of it
412+
together with explanation why this is acceptable.
413+
-->
414+
415+
<!--
416+
Additionally, for Alpha try to enumerate the core package you will be touching
417+
to implement this enhancement and provide the current unit coverage for those
418+
in the form of:
419+
- <package>: <date> - <current test coverage>
420+
The data can be easily read from:
421+
https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
422+
423+
This can inform certain test coverage improvements that we want to do before
424+
extending the production code to implement this enhancement.
425+
-->
405426

406-
- `k8s.io/kubernetes/pkg/kubelet/cm/cpumanager`: `20220606` - `86%`
427+
- `k8s.io/kubernetes/pkg/kubelet/cm/cpumanager`: `20220929` - `86.2%`
407428

408429
##### Integration tests
409430

410-
- <test>: <link to test coverage>
431+
- TBD
411432

412433
##### e2e tests
413434

414-
- <test>: <link to test coverage>
435+
- TBD
415436

416437
### Graduation Criteria
417438

@@ -433,6 +454,13 @@ TBD
433454
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
434455
- Allowing time for feedback
435456

457+
**Note:** Generally we also wait at least two releases between beta and
458+
GA/stable, because there's no opportunity for user feedback, or even bug reports,
459+
in back-to-back releases.
460+
461+
**For non-optional features moving to GA, the graduation criteria must include
462+
[conformance tests].**
463+
436464
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
437465

438466
#### Deprecation
@@ -469,14 +497,18 @@ Not relevant
469497

470498
###### Does enabling the feature change any default behavior?
471499

472-
TBD
500+
No, unless the non-none policy is explicitly configured.
473501

474502
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
475503

476-
TBD
504+
Yes, using the kubelet config.
477505

478506
###### What happens if we reenable the feature if it was previously rolled back?
479507

508+
The impact is node-local only.
509+
If the state of a node is steady, no changes.
510+
If a guaranteed pod is admitted, running non-guaranteed pods will have their CPU cgroup changed while running.
511+
480512
###### Are there any tests for feature enablement/disablement?
481513

482514
Yes, covered by e2e tests
@@ -485,57 +517,57 @@ Yes, covered by e2e tests
485517

486518
###### How can a rollout or rollback fail? Can it impact already running workloads?
487519

488-
TBD
520+
A rollout can fail if a bug in the cpumanager prevents _new_ pods to start, or existing pods to be restarted.
521+
Already running workload will not be affected if the node state is steady
489522

490523
###### What specific metrics should inform a rollback?
491524

492-
TBD
525+
Pod creation errors o a node-by-node basis.
493526

494527
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
495528

496-
TBD
529+
No to both.
530+
Changes in behavior only affects pods meeting the conditions (guaranteed QoS, integral CPU request) scheduler after the upgrade.
531+
Running pods will be unaffected by any change. This offers some degree of safety in both upgrade->rollback
532+
and upgrade->downgrade->upgrade scenarios.
497533

498534
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
499535

500-
TBD
536+
No
501537

502538
### Monitoring Requirements
503539

504-
TBD
540+
Monitor the pod admission counter
541+
Monitor the pods not going running after successful schedule
505542

506543
###### How can an operator determine if the feature is in use by workloads?
507544

508-
TBD
545+
The operator need to inspect the node and verify the cpu pinning assignment either checking the cgroups on the node
546+
or accessing the podresources API of the kubelet.
509547

510548
###### How can someone using this feature know that it is working for their instance?
511549

512-
TBD
513550

514-
- [ ] Events
515-
- Event Reason:
516-
- [ ] API .status
517-
- Condition name:
518-
- Other field:
519-
- [ ] Other (treat as last resort)
520-
- Details:
551+
- [X] Other (treat as last resort)
552+
- Details: the containers need to check the cpu set they are allowed to run; in addition, node agents (e.g. node_exporter)
553+
can report the CPU assignment
521554

522555
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
523556

524-
TBD
557+
- N/A
525558

526559
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
527560

528-
TBD
529-
- [ ] Metrics
530-
- Metric name:
531-
- [Optional] Aggregation method:
532-
- Components exposing the metric:
533561
- [ ] Other (treat as last resort)
534562
- Details:
563+
a operator should check that pods go running correctly and the cpu pinning is performed. The latter can
564+
be checked by inspecting the cgroups at node level.
535565

536566
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
537567

538-
TBD
568+
No, because all the metrics we were aware of leaked hardware details.
569+
All of the metrics experimented by consumers of the feature so far require to expose hardware details of the
570+
worker nodes, and are dependent on the worker node hardware configuration (e.g. processor core layout).
539571

540572
### Dependencies
541573

@@ -579,14 +611,15 @@ No
579611

580612
###### What are other known failure modes?
581613

582-
TBD
614+
After changing the CPU manager policy from `none` to `static` or the the other way around, before to start the kubelet again,
615+
you must remove the CPU manager state file(`/var/lib/kubelet/cpu_manager_state`), otherwise the kubelet start will fail.
616+
Startup failures for this reason will be logged in the kubelet log.
583617

584618
###### What steps should be taken if SLOs are not being met to determine the problem?
585619

586620
## Implementation History
587621

588-
- **2020-12-30:** kep translated to the most recent template available at time
589-
- **2022-06-06:** kep translated to the most recent template available at time; proposed to GA; added PRR info.
622+
- **2022-09-29:** kep translated to the most recent template available at time; proposed to GA; added PRR info.
590623

591624
## Drawbacks
592625

@@ -718,6 +751,8 @@ Record of information of the original KEP without a clear fit in the latest temp
718751

719752
[cat]: http://www.intel.com/content/www/us/en/communications/cache-monitoring-cache-allocation-technologies.html
720753
[cpuset-files]: http://man7.org/linux/man-pages/man7/cpuset.7.html#FILES
754+
[kubevirt-cpus]: https://kubevirt.io/user-guide/virtual_machines/dedicated_cpu_resources/
755+
[kubevirt-numa]: https://kubevirt.io/user-guide/virtual_machines/numa/#preconditions
721756
[ht]: http://www.intel.com/content/www/us/en/architecture-and-technology/hyper-threading/hyper-threading-technology.html
722757
[hwloc]: https://www.open-mpi.org/projects/hwloc
723758
[node-allocatable]: /contributors/design-proposals/node/node-allocatable.md#phase-2---enforce-allocatable-on-pods

0 commit comments

Comments
 (0)