8
8
- [ Non-Goals] ( #non-goals )
9
9
- [ Proposal] ( #proposal )
10
10
- [ User Stories (Optional)] ( #user-stories-optional )
11
- - [ Story 1] ( #story-1 )
12
- - [ Story 2] ( #story-2 )
11
+ - [ Story 1 : High-performance applications ] ( #story-1--high-performance-applications )
12
+ - [ Story 2 : KubeVirt ] ( #story-2--kubevirt )
13
13
- [ Notes/Constraints/Caveats (Optional)] ( #notesconstraintscaveats-optional )
14
14
- [ Risks and Mitigations] ( #risks-and-mitigations )
15
15
- [ Design Details] ( #design-details )
@@ -131,7 +131,7 @@ reconciliation loop.
131
131
132
132
### Non-Goals
133
133
134
- TBD
134
+ N/A
135
135
136
136
## Proposal
137
137
@@ -145,15 +145,20 @@ observability and checkpointing extensions._
145
145
146
146
### User Stories (Optional)
147
147
148
- TBD
148
+ #### Story 1 : High-performance applications
149
+
150
+ Systems such as real-time trading system or 5G CNFs (User Plane Function, UPF) need to maximize the CPU time; CPU pinning ensure exclusive CPU allocation and allows to avoid performance issues due to core switches, cold caches.
151
+ NUMA aware allocation of CPUs, provided by CPU manager cooperating with Topology Manager, is also a critical prerequisite for these applications to meet their performance requirement.
152
+ The alignment of resources on the same NUMA node, CPUs first and foremost, prevents performance degradation due to inter-node (between NUMA nodes) communication overhead.
149
153
150
- #### Story 1
154
+ #### Story 2 : KubeVirt
151
155
152
- #### Story 2
156
+ KubeVirt leverages the CPU pinning provided by CPU manager to assign full CPU cores to vCPUs inside the VM to [ enhance performance] [ kubevirt-cpus ] .
157
+ [ NUMA support for VMs] [ kubevirt-numa ] is also built on top of the CPU pinning and NUMA-aware CPU allocation.
153
158
154
159
### Notes/Constraints/Caveats (Optional)
155
160
156
- TBD
161
+ N/A
157
162
158
163
### Risks and Mitigations
159
164
@@ -399,19 +404,35 @@ to implement this enhancement.
399
404
400
405
##### Prerequisite testing updates
401
406
402
- TBD
403
-
404
407
##### Unit tests
408
+ <!--
409
+ In principle every added code should have complete unit test coverage, so providing
410
+ the exact set of tests will not bring additional value.
411
+ However, if complete unit test coverage is not possible, explain the reason of it
412
+ together with explanation why this is acceptable.
413
+ -->
414
+
415
+ <!--
416
+ Additionally, for Alpha try to enumerate the core package you will be touching
417
+ to implement this enhancement and provide the current unit coverage for those
418
+ in the form of:
419
+ - <package>: <date> - <current test coverage>
420
+ The data can be easily read from:
421
+ https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
422
+
423
+ This can inform certain test coverage improvements that we want to do before
424
+ extending the production code to implement this enhancement.
425
+ -->
405
426
406
- - ` k8s.io/kubernetes/pkg/kubelet/cm/cpumanager ` : ` 20220606 ` - ` 86% `
427
+ - ` k8s.io/kubernetes/pkg/kubelet/cm/cpumanager ` : ` 20220929 ` - ` 86.2 % `
407
428
408
429
##### Integration tests
409
430
410
- - < test >: < link to test coverage >
431
+ - TBD
411
432
412
433
##### e2e tests
413
434
414
- - < test >: < link to test coverage >
435
+ - TBD
415
436
416
437
### Graduation Criteria
417
438
433
454
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
434
455
- Allowing time for feedback
435
456
457
+ ** Note:** Generally we also wait at least two releases between beta and
458
+ GA/stable, because there's no opportunity for user feedback, or even bug reports,
459
+ in back-to-back releases.
460
+
461
+ ** For non-optional features moving to GA, the graduation criteria must include
462
+ [ conformance tests] .**
463
+
436
464
[ conformance tests ] : https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
437
465
438
466
#### Deprecation
@@ -469,14 +497,18 @@ Not relevant
469
497
470
498
###### Does enabling the feature change any default behavior?
471
499
472
- TBD
500
+ No, unless the non-none policy is explicitly configured.
473
501
474
502
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
475
503
476
- TBD
504
+ Yes, using the kubelet config.
477
505
478
506
###### What happens if we reenable the feature if it was previously rolled back?
479
507
508
+ The impact is node-local only.
509
+ If the state of a node is steady, no changes.
510
+ If a guaranteed pod is admitted, running non-guaranteed pods will have their CPU cgroup changed while running.
511
+
480
512
###### Are there any tests for feature enablement/disablement?
481
513
482
514
Yes, covered by e2e tests
@@ -485,57 +517,57 @@ Yes, covered by e2e tests
485
517
486
518
###### How can a rollout or rollback fail? Can it impact already running workloads?
487
519
488
- TBD
520
+ A rollout can fail if a bug in the cpumanager prevents _ new_ pods to start, or existing pods to be restarted.
521
+ Already running workload will not be affected if the node state is steady
489
522
490
523
###### What specific metrics should inform a rollback?
491
524
492
- TBD
525
+ Pod creation errors o a node-by-node basis.
493
526
494
527
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
495
528
496
- TBD
529
+ No to both.
530
+ Changes in behavior only affects pods meeting the conditions (guaranteed QoS, integral CPU request) scheduler after the upgrade.
531
+ Running pods will be unaffected by any change. This offers some degree of safety in both upgrade->rollback
532
+ and upgrade->downgrade->upgrade scenarios.
497
533
498
534
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
499
535
500
- TBD
536
+ No
501
537
502
538
### Monitoring Requirements
503
539
504
- TBD
540
+ Monitor the pod admission counter
541
+ Monitor the pods not going running after successful schedule
505
542
506
543
###### How can an operator determine if the feature is in use by workloads?
507
544
508
- TBD
545
+ The operator need to inspect the node and verify the cpu pinning assignment either checking the cgroups on the node
546
+ or accessing the podresources API of the kubelet.
509
547
510
548
###### How can someone using this feature know that it is working for their instance?
511
549
512
- TBD
513
550
514
- - [ ] Events
515
- - Event Reason:
516
- - [ ] API .status
517
- - Condition name:
518
- - Other field:
519
- - [ ] Other (treat as last resort)
520
- - Details:
551
+ - [X] Other (treat as last resort)
552
+ - Details: the containers need to check the cpu set they are allowed to run; in addition, node agents (e.g. node_exporter)
553
+ can report the CPU assignment
521
554
522
555
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
523
556
524
- TBD
557
+ - N/A
525
558
526
559
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
527
560
528
- TBD
529
- - [ ] Metrics
530
- - Metric name:
531
- - [ Optional] Aggregation method:
532
- - Components exposing the metric:
533
561
- [ ] Other (treat as last resort)
534
562
- Details:
563
+ a operator should check that pods go running correctly and the cpu pinning is performed. The latter can
564
+ be checked by inspecting the cgroups at node level.
535
565
536
566
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
537
567
538
- TBD
568
+ No, because all the metrics we were aware of leaked hardware details.
569
+ All of the metrics experimented by consumers of the feature so far require to expose hardware details of the
570
+ worker nodes, and are dependent on the worker node hardware configuration (e.g. processor core layout).
539
571
540
572
### Dependencies
541
573
579
611
580
612
###### What are other known failure modes?
581
613
582
- TBD
614
+ After changing the CPU manager policy from ` none ` to ` static ` or the the other way around, before to start the kubelet again,
615
+ you must remove the CPU manager state file(` /var/lib/kubelet/cpu_manager_state ` ), otherwise the kubelet start will fail.
616
+ Startup failures for this reason will be logged in the kubelet log.
583
617
584
618
###### What steps should be taken if SLOs are not being met to determine the problem?
585
619
586
620
## Implementation History
587
621
588
- - ** 2020-12-30:** kep translated to the most recent template available at time
589
- - ** 2022-06-06:** kep translated to the most recent template available at time; proposed to GA; added PRR info.
622
+ - ** 2022-09-29:** kep translated to the most recent template available at time; proposed to GA; added PRR info.
590
623
591
624
## Drawbacks
592
625
@@ -718,6 +751,8 @@ Record of information of the original KEP without a clear fit in the latest temp
718
751
719
752
[ cat ] : http://www.intel.com/content/www/us/en/communications/cache-monitoring-cache-allocation-technologies.html
720
753
[ cpuset-files ] : http://man7.org/linux/man-pages/man7/cpuset.7.html#FILES
754
+ [ kubevirt-cpus ] : https://kubevirt.io/user-guide/virtual_machines/dedicated_cpu_resources/
755
+ [ kubevirt-numa ] : https://kubevirt.io/user-guide/virtual_machines/numa/#preconditions
721
756
[ ht ] : http://www.intel.com/content/www/us/en/architecture-and-technology/hyper-threading/hyper-threading-technology.html
722
757
[ hwloc ] : https://www.open-mpi.org/projects/hwloc
723
758
[ node-allocatable ] : /contributors/design-proposals/node/node-allocatable.md#phase-2---enforce-allocatable-on-pods
0 commit comments