7
7
- [ Goals] ( #goals )
8
8
- [ Non-Goals] ( #non-goals )
9
9
- [ Proposal] ( #proposal )
10
- - [ Design Overview] ( #design-overview )
11
10
- [ User Stories] ( #user-stories )
12
11
- [ Story 1 : High-performance packet processing with DPDK] ( #story-1--high-performance-packet-processing-with-dpdk )
13
12
- [ Story 2 : Databases] ( #story-2--databases )
14
13
- [ Story 3 : KubeVirt (provided by @rmohr )] ( #story-3--kubevirt-provided-by-rmohr )
14
+ - [ Notes/Constraints/Caveats (Optional)] ( #notesconstraintscaveats-optional )
15
15
- [ Risks and Mitigations] ( #risks-and-mitigations )
16
16
- [ UX] ( #ux )
17
17
- [ Design Details] ( #design-details )
18
+ - [ Overview] ( #overview )
18
19
- [ How to enable the guaranteed memory allocation over many NUMA nodes?] ( #how-to-enable-the-guaranteed-memory-allocation-over-many-numa-nodes )
19
20
- [ The Concept of Node Map and Memory Maps] ( #the-concept-of-node-map-and-memory-maps )
20
21
- [ Memory Map] ( #memory-map )
35
36
- [ Test Plan] ( #test-plan )
36
37
- [ Single-NUMA Systems Tests] ( #single-numa-systems-tests )
37
38
- [ Multi-NUMA System Tests] ( #multi-numa-system-tests )
39
+ - [ Prerequisite testing updates] ( #prerequisite-testing-updates )
40
+ - [ Unit tests] ( #unit-tests )
41
+ - [ Integration tests] ( #integration-tests )
42
+ - [ e2e tests] ( #e2e-tests )
38
43
- [ Graduation Criteria] ( #graduation-criteria )
39
44
- [ Phase 1: Alpha (target v1.21)] ( #phase-1-alpha-target-v121 )
40
45
- [ Phase 2: Beta (target v1.22)] ( #phase-2-beta-target-v122 )
@@ -136,26 +141,15 @@ The next subsection introduces to the proposal by outlining its design.
136
141
137
142
### Notes/Constraints/Caveats (Optional)
138
143
139
- <!--
140
- What are the caveats to the proposal?
141
- What are some important details that didn't come across above?
142
- Go in to as much detail as necessary here.
143
- This might be a good place to talk about core concepts and how they relate.
144
- -->
144
+ Allocation of resources happens, like all the other resource managers (CPU manager, device manager), at admission time. This behavior creates a disconnect with the scheduler
145
+ view of the system, which doesn't consider resource alignments or locality, which can contribute to runaway pod creation.
145
146
146
147
### Risks and Mitigations
147
148
148
- <!--
149
- What are the risks of this proposal and how do we mitigate. Think broadly.
150
- For example, consider both security and how this will impact the larger
151
- kubernetes ecosystem.
152
-
153
- How will security be reviewed and by whom?
149
+ Bugs in memorymanager can cause the kubelet to crash, or workloads to start with incorrect pinning. This can be mitigated with comprehensive testing and improving the observability of the system (see metrics).
154
150
155
- How will UX be reviewed and by whom?
156
-
157
- Consider including folks that also work outside the SIG or subproject.
158
- -->
151
+ The memory manager ` None ` policy is very simple and pretty safe, being almost a no-operation. The ` Static ` policy, which enabled memory pinning, has seen barely changes since beta,
152
+ and because of interactions with other features (VPA).
159
153
160
154
#### UX
161
155
@@ -325,10 +319,10 @@ Syntax:
325
319
326
320
#### Memory Manager Policy Flag
327
321
328
- The ` static ` value of the flag enables the guaranteed memory allocation, whereas ` none ` disables it.
322
+ The ` Static ` value of the flag enables the guaranteed memory allocation, whereas ` None ` disables it.
329
323
330
324
Syntax:
331
- ` --memory-manager-policy=static|none `
325
+ ` --memory-manager-policy=Static|None `
332
326
333
327
#### Reserved Memory Flag
334
328
@@ -439,32 +433,8 @@ Memory pinning will be validated for Topology Manager `single-numa-node` and `re
439
433
440
434
##### Prerequisite testing updates
441
435
442
- <!--
443
- Based on reviewers feedback describe what additional tests need to be added prior
444
- implementing this enhancement to ensure the enhancements have also solid foundations.
445
- -->
446
-
447
436
##### Unit tests
448
437
449
- <!--
450
- In principle every added code should have complete unit test coverage, so providing
451
- the exact set of tests will not bring additional value.
452
- However, if complete unit test coverage is not possible, explain the reason of it
453
- together with explanation why this is acceptable.
454
- -->
455
-
456
- <!--
457
- Additionally, for Alpha try to enumerate the core package you will be touching
458
- to implement this enhancement and provide the current unit coverage for those
459
- in the form of:
460
- - <package>: <date> - <current test coverage>
461
- The data can be easily read from:
462
- https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
463
-
464
- This can inform certain test coverage improvements that we want to do before
465
- extending the production code to implement this enhancement.
466
- -->
467
-
468
438
- ` k8s.io/kubernetes/pkg/kubelet/cm/memorymanager ` : ` 20230928 ` - ` 81% `
469
439
470
440
##### Integration tests
@@ -487,17 +457,7 @@ https://storage.googleapis.com/k8s-triage/index.html
487
457
488
458
##### e2e tests
489
459
490
- <!--
491
- This question should be filled when targeting a release.
492
- For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
493
-
494
- For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
495
- https://storage.googleapis.com/k8s-triage/index.html
496
-
497
- We expect no non-infra related flakes in the last month as a GA graduation criteria.
498
- -->
499
-
500
- - https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/memory_manager_test.go
460
+ [ Memory Manager E2E Tests] ( https://storage.googleapis.com/k8s-triage/index.html?sig=node&test=Memory%20Manager )
501
461
502
462
### Graduation Criteria
503
463
@@ -518,33 +478,19 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
518
478
519
479
#### GA (stable)
520
480
521
- - N examples of real-world usage
522
- - N installs
481
+ - 2 or more examples of real-world usage
482
+ - 2 or more installs
523
483
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
524
484
- Allowing time for feedback
525
485
526
- ** Note:** Generally we also wait at least two releases between beta and
527
- GA/stable, because there's no opportunity for user feedback, or even bug reports,
528
- in back-to-back releases.
529
-
530
- ** For non-optional features moving to GA, the graduation criteria must include
531
- [ conformance tests] .**
532
-
486
+ ** Note about conformance tests**
533
487
[ conformance tests ] : https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
488
+ Conformance tests validate API endpoints, but memory manager has no API endpoints.
489
+ We defer the correctness of the behavior (resource allocation) and the observable behavior (e.g. metrics)
490
+ to e2e tests, like the other kubelet resource managers.
534
491
535
492
### Upgrade / Downgrade Strategy
536
493
537
- <!--
538
- If applicable, how will the component be upgraded and downgraded? Make sure
539
- this is in the test plan.
540
-
541
- Consider the following in developing an upgrade/downgrade strategy for this
542
- enhancement:
543
- - What changes (in invocations, configurations, API use, etc.) is an existing
544
- cluster required to make on upgrade in order to keep previous behavior?
545
- - What changes (in invocations, configurations, API use, etc.) is an existing
546
- cluster required to make on upgrade in order to make use of the enhancement?
547
- -->
548
494
Not applicable.
549
495
550
496
### Version Skew Strategy
@@ -591,8 +537,6 @@ you need any help or guidance.
591
537
592
538
### Feature Enablement and Rollback
593
539
594
- _ This section must be completed when targeting alpha to a release._
595
-
596
540
###### How can this feature be enabled / disabled in a live cluster?
597
541
598
542
- [X] Feature gate (also fill in values in ` kep.yaml ` )
@@ -609,11 +553,21 @@ Yes, the admission flow changes for a pod in Guaranteed QoS class. With the Memo
609
553
610
554
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
611
555
612
- Yes, it uses a feature gate.
556
+ Pre-GA: the feature is using ` MemoryManager ` feature gate and can be disabled.
557
+ Disabling the feature gate requires kubelet restart for changes to take effect.
558
+ In case no workloads are running on the node, or consuming memory, disabling the
559
+ feature should be seamless.
613
560
614
- <!-- Also set `disable-supported` to `true` or `false` in `kep.yaml`.
615
- Describe the consequences on existing workloads (e.g., if this is a runtime
616
- feature, can it break the existing applications?). -->
561
+ GA and later: the code paths will always be hit during execution.
562
+ However, disabling the feature is still possible reconfiguring the kubelet to use
563
+ the ` None ` memory manager policy, which disables the vast majority of the code paths.
564
+
565
+ In case there are running workloads which are requested for memory, it's the
566
+ cluster admin's responsibility to drain the node, prior to the feature disablement.
567
+
568
+ If the node is not drained and the feature is disabled, the running workloads which
569
+ had memory assignment must be assumed to lose the previous guarantees, as the memory
570
+ manager is no longer enforcing them.
617
571
618
572
###### What happens if we reenable the feature if it was previously rolled back?
619
573
@@ -625,42 +579,41 @@ Yes, there is a number of Unit Tests designated for State file validation.
625
579
626
580
### Rollout, Upgrade and Rollback Planning
627
581
628
- _ This section must be completed when targeting beta graduation to a release._
629
-
630
582
###### How can a rollout or rollback fail? Can it impact already running workloads?
631
583
632
- It is possible that the state file will have inconsistent data during the rollout, because of the kubelet restart, but
633
- you can easily to fix it by removing memory manager state file and run kubelet restart. It should not affect any running
634
- workloads.
584
+ It is possible that the state file will have inconsistent data during the rollout, because of the kubelet restart, but
585
+ you can easily to fix it by removing memory manager state file and run kubelet restart. It should not affect any running
586
+ workloads.
635
587
636
588
637
589
###### What specific metrics should inform a rollback?
638
- The pod may fail with the admission error because the kubelet can not provide all resources aligned from the same NUMA node.
639
- You can see the error message under the pod events.
640
- "memory_manager_assignment_errors_total".
590
+
591
+ The pod may fail with the admission error because the kubelet can not provide all resources aligned from the same NUMA node.
592
+ You can see the error message under the pod events.
593
+ Additionally, the value of metric ` memory_manager_pinning_errors_total ` is expected to be consistently equal to zero
594
+ under normal operating conditions; admission rejects will cause the value of this metric to increase.
641
595
642
596
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
643
- Tested it manually by replacing the kubelet binary on the node with the ` Static ` memory manager policy, but I failed
644
- to find correct procedure how to test upgrade from 1.21 to my custom build with updated kubelet binary.
597
+
598
+ Tested it manually by replacing the kubelet binary on the node with the ` Static ` memory manager policy, but I failed
599
+ to find correct procedure how to test upgrade from 1.21 to my custom build with updated kubelet binary.
645
600
646
601
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
647
602
648
- fields of API types, flags, etc.?**
649
- No.
603
+ No.
650
604
651
605
### Monitoring Requirements
652
606
653
607
Monitor the metrics
654
- - " memory_manager_pinning_requests_total"
655
- - "memory_manager_assignment_errors_total"
608
+ - ` memory_manager_pinning_requests_total `
609
+ - ` memory_manager_pinning_errors_total `
656
610
657
611
###### How can an operator determine if the feature is in use by workloads?
658
- In order for workloads to request exclusive memory allocation, the pod QoS must be "guaranteed".
659
- The memory manager data will be available under pod resources API.
660
- When it is configured with the static policy,
661
- you will see memory related data during call to the pod resources API List method under the container.
662
- In addition, in case the workload is guaranteed, the metric named memory_manager_pinning_requests_total should
663
- be incremented.
612
+
613
+ The memory manager data will be available under pod resources API.
614
+ When it is configured with the static policy, you will see memory related data during call to the pod resources API List method under the container.
615
+
616
+ In addition, in case the workload is guaranteed, the metric named ` memory_manager_pinning_requests_total ` should `be incremented.
664
617
665
618
###### How can someone using this feature know that it is working for their instance?
666
619
@@ -680,72 +633,54 @@ Monitor the metrics
680
633
To understand the reason, you will need to check via pod resources API the amount of allocatable memory and memory reserved by containers.
681
634
682
635
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
683
- This does not seem relevant to this feature.
684
636
685
- ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
637
+ This does not seem relevant to this feature.
686
638
687
- <!--
688
- Pick one more of these and delete the rest.
689
- -->
639
+ ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
690
640
691
641
- [X] Metrics
692
642
- Metric name:
693
- - memory_manager_pinning_requests_total
694
- - memory_manager_assignment_errors_total
643
+ - ` memory_manager_pinning_requests_total `
644
+ - ` memory_manager_pinning_errors_total `
695
645
696
646
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
697
- <!--
698
- Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
699
- implementation difficulties, etc.).
700
- -->
701
- - "memory_manager_pinning_requests_total"
702
- - "memory_manager_assignment_errors_total"
647
+ - ` memory_manager_pinning_requests_total `
648
+ - ` memory_manager_pinning_errors_total `
703
649
704
650
The addition of these metrics will be done before moving to GA
705
- issue - ` <TBD> `
706
- PR - ` <TBD> `
651
+ issue - ` https://github.com/kubernetes/kubernetes/issues/120986 `
707
652
708
653
### Dependencies
709
654
710
- _ This section must be completed when targeting beta graduation to a release._
711
-
712
655
###### Does this feature depend on any specific services running in the cluster?
713
- No.
656
+ No.
714
657
715
658
716
659
### Scalability
717
660
718
- _ For alpha, this section is encouraged: reviewers should consider these questions
719
- and attempt to answer them._
720
-
721
- _ For beta, this section is required: reviewers must answer these questions._
722
-
723
- _ For GA, this section is required: approvers should be able to confirm the
724
- previous answers based on experience in the field._
725
-
726
661
###### Will enabling / using this feature result in any new API calls?
727
662
728
- No.
663
+ No.
729
664
730
665
###### Will enabling / using this feature result in introducing new API types?
731
666
732
- No.
667
+ No.
733
668
734
669
###### Will enabling / using this feature result in any new calls to the cloud provider?
735
670
736
- No.
671
+ No.
737
672
738
673
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
739
674
740
- No.
675
+ No.
741
676
742
677
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
743
678
744
- No.
679
+ No.
745
680
746
681
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
747
682
748
- No.
683
+ No.
749
684
750
685
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
751
686
<!--
@@ -765,20 +700,22 @@ The Troubleshooting section currently serves the `Playbook` role. We may conside
765
700
splitting it into a dedicated ` Playbook ` document (potentially with some monitoring
766
701
details). For now, we leave it here.
767
702
768
- _ This section must be completed when targeting beta graduation to a release._
769
-
770
703
###### How does this feature react if the API server and/or etcd is unavailable?
771
- No impact.
704
+ No impact.
772
705
773
706
###### What are other known failure modes?
774
707
775
- During the enabling and disabling of the memory manager(changing memory manager policy) you must remove the memory
776
- manager state file(` /var/lib/kubelet/memory_manager_state ` ), otherwise the kubelet start will fail.
777
- You can identify the issue via check of the kubelet log.
708
+ During the enabling and disabling of the memory manager(changing memory manager policy) you must remove the memory
709
+ manager state file(` /var/lib/kubelet/memory_manager_state ` ), otherwise the kubelet start will fail.
710
+ You can identify the issue by checking of the kubelet log.
711
+
712
+ If the node is not drained and the feature is disabled, the running workloads which
713
+ had memory assignment must be assumed to lose the previous guarantees, as the memory
714
+ manager is no longer enforcing them.
778
715
779
716
###### What steps should be taken if SLOs are not being met to determine the problem?
780
717
781
- Not applicable.
718
+ Not applicable.
782
719
783
720
[ supported limits ] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
784
721
[ existing SLIs/SLOs ] : https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
@@ -792,9 +729,6 @@ https://github.com/kubernetes/kubernetes/pull/95479
792
729
793
730
## Drawbacks
794
731
795
- <!--
796
- Why should this KEP _not_ be implemented?
797
- -->
798
732
No objections exist to implement this KEP.
799
733
800
734
## Alternatives
0 commit comments