Skip to content

Commit 6d60f7a

Browse files
Tal-orffromani
andcommitted
node: memmgr: address reviewers comments
Addressed reviewers' comments, add clarifications and missing bits. Make memory manager metrics naming more consistent. Signed-off-by: Talor Itzhak <[email protected]> Co-authored-by: Francesco Romani <[email protected]>
1 parent 052f4b9 commit 6d60f7a

File tree

2 files changed

+78
-144
lines changed

2 files changed

+78
-144
lines changed

keps/sig-node/1769-memory-manager/README.md

Lines changed: 76 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,15 @@
77
- [Goals](#goals)
88
- [Non-Goals](#non-goals)
99
- [Proposal](#proposal)
10-
- [Design Overview](#design-overview)
1110
- [User Stories](#user-stories)
1211
- [Story 1 : High-performance packet processing with DPDK](#story-1--high-performance-packet-processing-with-dpdk)
1312
- [Story 2 : Databases](#story-2--databases)
1413
- [Story 3 : KubeVirt (provided by @rmohr)](#story-3--kubevirt-provided-by-rmohr)
14+
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
1515
- [Risks and Mitigations](#risks-and-mitigations)
1616
- [UX](#ux)
1717
- [Design Details](#design-details)
18+
- [Overview](#overview)
1819
- [How to enable the guaranteed memory allocation over many NUMA nodes?](#how-to-enable-the-guaranteed-memory-allocation-over-many-numa-nodes)
1920
- [The Concept of Node Map and Memory Maps](#the-concept-of-node-map-and-memory-maps)
2021
- [Memory Map](#memory-map)
@@ -35,6 +36,10 @@
3536
- [Test Plan](#test-plan)
3637
- [Single-NUMA Systems Tests](#single-numa-systems-tests)
3738
- [Multi-NUMA System Tests](#multi-numa-system-tests)
39+
- [Prerequisite testing updates](#prerequisite-testing-updates)
40+
- [Unit tests](#unit-tests)
41+
- [Integration tests](#integration-tests)
42+
- [e2e tests](#e2e-tests)
3843
- [Graduation Criteria](#graduation-criteria)
3944
- [Phase 1: Alpha (target v1.21)](#phase-1-alpha-target-v121)
4045
- [Phase 2: Beta (target v1.22)](#phase-2-beta-target-v122)
@@ -136,26 +141,15 @@ The next subsection introduces to the proposal by outlining its design.
136141

137142
### Notes/Constraints/Caveats (Optional)
138143

139-
<!--
140-
What are the caveats to the proposal?
141-
What are some important details that didn't come across above?
142-
Go in to as much detail as necessary here.
143-
This might be a good place to talk about core concepts and how they relate.
144-
-->
144+
Allocation of resources happens, like all the other resource managers (CPU manager, device manager), at admission time. This behavior creates a disconnect with the scheduler
145+
view of the system, which doesn't consider resource alignments or locality, which can contribute to runaway pod creation.
145146

146147
### Risks and Mitigations
147148

148-
<!--
149-
What are the risks of this proposal and how do we mitigate. Think broadly.
150-
For example, consider both security and how this will impact the larger
151-
kubernetes ecosystem.
152-
153-
How will security be reviewed and by whom?
149+
Bugs in memorymanager can cause the kubelet to crash, or workloads to start with incorrect pinning. This can be mitigated with comprehensive testing and improving the observability of the system (see metrics).
154150

155-
How will UX be reviewed and by whom?
156-
157-
Consider including folks that also work outside the SIG or subproject.
158-
-->
151+
The memory manager `None` policy is very simple and pretty safe, being almost a no-operation. The `Static` policy, which enabled memory pinning, has seen barely changes since beta,
152+
and because of interactions with other features (VPA).
159153

160154
#### UX
161155

@@ -325,10 +319,10 @@ Syntax:
325319

326320
#### Memory Manager Policy Flag
327321

328-
The `static` value of the flag enables the guaranteed memory allocation, whereas `none` disables it.
322+
The `Static` value of the flag enables the guaranteed memory allocation, whereas `None` disables it.
329323

330324
Syntax:
331-
`--memory-manager-policy=static|none`
325+
`--memory-manager-policy=Static|None`
332326

333327
#### Reserved Memory Flag
334328

@@ -439,32 +433,8 @@ Memory pinning will be validated for Topology Manager `single-numa-node` and `re
439433

440434
##### Prerequisite testing updates
441435

442-
<!--
443-
Based on reviewers feedback describe what additional tests need to be added prior
444-
implementing this enhancement to ensure the enhancements have also solid foundations.
445-
-->
446-
447436
##### Unit tests
448437

449-
<!--
450-
In principle every added code should have complete unit test coverage, so providing
451-
the exact set of tests will not bring additional value.
452-
However, if complete unit test coverage is not possible, explain the reason of it
453-
together with explanation why this is acceptable.
454-
-->
455-
456-
<!--
457-
Additionally, for Alpha try to enumerate the core package you will be touching
458-
to implement this enhancement and provide the current unit coverage for those
459-
in the form of:
460-
- <package>: <date> - <current test coverage>
461-
The data can be easily read from:
462-
https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
463-
464-
This can inform certain test coverage improvements that we want to do before
465-
extending the production code to implement this enhancement.
466-
-->
467-
468438
- `k8s.io/kubernetes/pkg/kubelet/cm/memorymanager`: `20230928` - `81%`
469439

470440
##### Integration tests
@@ -487,17 +457,7 @@ https://storage.googleapis.com/k8s-triage/index.html
487457

488458
##### e2e tests
489459

490-
<!--
491-
This question should be filled when targeting a release.
492-
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
493-
494-
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
495-
https://storage.googleapis.com/k8s-triage/index.html
496-
497-
We expect no non-infra related flakes in the last month as a GA graduation criteria.
498-
-->
499-
500-
- https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/memory_manager_test.go
460+
[Memory Manager E2E Tests](https://storage.googleapis.com/k8s-triage/index.html?sig=node&test=Memory%20Manager)
501461

502462
### Graduation Criteria
503463

@@ -518,33 +478,19 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
518478

519479
#### GA (stable)
520480

521-
- N examples of real-world usage
522-
- N installs
481+
- 2 or more examples of real-world usage
482+
- 2 or more installs
523483
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
524484
- Allowing time for feedback
525485

526-
**Note:** Generally we also wait at least two releases between beta and
527-
GA/stable, because there's no opportunity for user feedback, or even bug reports,
528-
in back-to-back releases.
529-
530-
**For non-optional features moving to GA, the graduation criteria must include
531-
[conformance tests].**
532-
486+
**Note about conformance tests**
533487
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
488+
Conformance tests validate API endpoints, but memory manager has no API endpoints.
489+
We defer the correctness of the behavior (resource allocation) and the observable behavior (e.g. metrics)
490+
to e2e tests, like the other kubelet resource managers.
534491

535492
### Upgrade / Downgrade Strategy
536493

537-
<!--
538-
If applicable, how will the component be upgraded and downgraded? Make sure
539-
this is in the test plan.
540-
541-
Consider the following in developing an upgrade/downgrade strategy for this
542-
enhancement:
543-
- What changes (in invocations, configurations, API use, etc.) is an existing
544-
cluster required to make on upgrade in order to keep previous behavior?
545-
- What changes (in invocations, configurations, API use, etc.) is an existing
546-
cluster required to make on upgrade in order to make use of the enhancement?
547-
-->
548494
Not applicable.
549495

550496
### Version Skew Strategy
@@ -591,8 +537,6 @@ you need any help or guidance.
591537

592538
### Feature Enablement and Rollback
593539

594-
_This section must be completed when targeting alpha to a release._
595-
596540
###### How can this feature be enabled / disabled in a live cluster?
597541

598542
- [X] Feature gate (also fill in values in `kep.yaml`)
@@ -609,11 +553,21 @@ Yes, the admission flow changes for a pod in Guaranteed QoS class. With the Memo
609553

610554
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
611555

612-
Yes, it uses a feature gate.
556+
Pre-GA: the feature is using `MemoryManager` feature gate and can be disabled.
557+
Disabling the feature gate requires kubelet restart for changes to take effect.
558+
In case no workloads are running on the node, or consuming memory, disabling the
559+
feature should be seamless.
613560

614-
<!-- Also set `disable-supported` to `true` or `false` in `kep.yaml`.
615-
Describe the consequences on existing workloads (e.g., if this is a runtime
616-
feature, can it break the existing applications?). -->
561+
GA and later: the code paths will always be hit during execution.
562+
However, disabling the feature is still possible reconfiguring the kubelet to use
563+
the `None` memory manager policy, which disables the vast majority of the code paths.
564+
565+
In case there are running workloads which are requested for memory, it's the
566+
cluster admin's responsibility to drain the node, prior to the feature disablement.
567+
568+
If the node is not drained and the feature is disabled, the running workloads which
569+
had memory assignment must be assumed to lose the previous guarantees, as the memory
570+
manager is no longer enforcing them.
617571

618572
###### What happens if we reenable the feature if it was previously rolled back?
619573

@@ -625,42 +579,41 @@ Yes, there is a number of Unit Tests designated for State file validation.
625579

626580
### Rollout, Upgrade and Rollback Planning
627581

628-
_This section must be completed when targeting beta graduation to a release._
629-
630582
###### How can a rollout or rollback fail? Can it impact already running workloads?
631583

632-
It is possible that the state file will have inconsistent data during the rollout, because of the kubelet restart, but
633-
you can easily to fix it by removing memory manager state file and run kubelet restart. It should not affect any running
634-
workloads.
584+
It is possible that the state file will have inconsistent data during the rollout, because of the kubelet restart, but
585+
you can easily to fix it by removing memory manager state file and run kubelet restart. It should not affect any running
586+
workloads.
635587

636588

637589
###### What specific metrics should inform a rollback?
638-
The pod may fail with the admission error because the kubelet can not provide all resources aligned from the same NUMA node.
639-
You can see the error message under the pod events.
640-
"memory_manager_assignment_errors_total".
590+
591+
The pod may fail with the admission error because the kubelet can not provide all resources aligned from the same NUMA node.
592+
You can see the error message under the pod events.
593+
Additionally, the value of metric `memory_manager_pinning_errors_total` is expected to be consistently equal to zero
594+
under normal operating conditions; admission rejects will cause the value of this metric to increase.
641595

642596
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
643-
Tested it manually by replacing the kubelet binary on the node with the `Static` memory manager policy, but I failed
644-
to find correct procedure how to test upgrade from 1.21 to my custom build with updated kubelet binary.
597+
598+
Tested it manually by replacing the kubelet binary on the node with the `Static` memory manager policy, but I failed
599+
to find correct procedure how to test upgrade from 1.21 to my custom build with updated kubelet binary.
645600

646601
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
647602

648-
fields of API types, flags, etc.?**
649-
No.
603+
No.
650604

651605
### Monitoring Requirements
652606

653607
Monitor the metrics
654-
- "memory_manager_pinning_requests_total"
655-
- "memory_manager_assignment_errors_total"
608+
- `memory_manager_pinning_requests_total`
609+
- `memory_manager_pinning_errors_total`
656610

657611
###### How can an operator determine if the feature is in use by workloads?
658-
In order for workloads to request exclusive memory allocation, the pod QoS must be "guaranteed".
659-
The memory manager data will be available under pod resources API.
660-
When it is configured with the static policy,
661-
you will see memory related data during call to the pod resources API List method under the container.
662-
In addition, in case the workload is guaranteed, the metric named memory_manager_pinning_requests_total should
663-
be incremented.
612+
613+
The memory manager data will be available under pod resources API.
614+
When it is configured with the static policy, you will see memory related data during call to the pod resources API List method under the container.
615+
616+
In addition, in case the workload is guaranteed, the metric named `memory_manager_pinning_requests_total` should `be incremented.
664617

665618
###### How can someone using this feature know that it is working for their instance?
666619

@@ -680,72 +633,54 @@ Monitor the metrics
680633
To understand the reason, you will need to check via pod resources API the amount of allocatable memory and memory reserved by containers.
681634

682635
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
683-
This does not seem relevant to this feature.
684636

685-
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
637+
This does not seem relevant to this feature.
686638

687-
<!--
688-
Pick one more of these and delete the rest.
689-
-->
639+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
690640

691641
- [X] Metrics
692642
- Metric name:
693-
- memory_manager_pinning_requests_total
694-
- memory_manager_assignment_errors_total
643+
- `memory_manager_pinning_requests_total`
644+
- `memory_manager_pinning_errors_total`
695645

696646
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
697-
<!--
698-
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
699-
implementation difficulties, etc.).
700-
-->
701-
- "memory_manager_pinning_requests_total"
702-
- "memory_manager_assignment_errors_total"
647+
- `memory_manager_pinning_requests_total`
648+
- `memory_manager_pinning_errors_total`
703649

704650
The addition of these metrics will be done before moving to GA
705-
issue - `<TBD>`
706-
PR - `<TBD>`
651+
issue - `https://github.com/kubernetes/kubernetes/issues/120986`
707652

708653
### Dependencies
709654

710-
_This section must be completed when targeting beta graduation to a release._
711-
712655
###### Does this feature depend on any specific services running in the cluster?
713-
No.
656+
No.
714657

715658

716659
### Scalability
717660

718-
_For alpha, this section is encouraged: reviewers should consider these questions
719-
and attempt to answer them._
720-
721-
_For beta, this section is required: reviewers must answer these questions._
722-
723-
_For GA, this section is required: approvers should be able to confirm the
724-
previous answers based on experience in the field._
725-
726661
###### Will enabling / using this feature result in any new API calls?
727662

728-
No.
663+
No.
729664

730665
###### Will enabling / using this feature result in introducing new API types?
731666

732-
No.
667+
No.
733668

734669
###### Will enabling / using this feature result in any new calls to the cloud provider?
735670

736-
No.
671+
No.
737672

738673
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
739674

740-
No.
675+
No.
741676

742677
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
743678

744-
No.
679+
No.
745680

746681
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
747682

748-
No.
683+
No.
749684

750685
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
751686
<!--
@@ -765,20 +700,22 @@ The Troubleshooting section currently serves the `Playbook` role. We may conside
765700
splitting it into a dedicated `Playbook` document (potentially with some monitoring
766701
details). For now, we leave it here.
767702

768-
_This section must be completed when targeting beta graduation to a release._
769-
770703
###### How does this feature react if the API server and/or etcd is unavailable?
771-
No impact.
704+
No impact.
772705

773706
###### What are other known failure modes?
774707

775-
During the enabling and disabling of the memory manager(changing memory manager policy) you must remove the memory
776-
manager state file(`/var/lib/kubelet/memory_manager_state`), otherwise the kubelet start will fail.
777-
You can identify the issue via check of the kubelet log.
708+
During the enabling and disabling of the memory manager(changing memory manager policy) you must remove the memory
709+
manager state file(`/var/lib/kubelet/memory_manager_state`), otherwise the kubelet start will fail.
710+
You can identify the issue by checking of the kubelet log.
711+
712+
If the node is not drained and the feature is disabled, the running workloads which
713+
had memory assignment must be assumed to lose the previous guarantees, as the memory
714+
manager is no longer enforcing them.
778715

779716
###### What steps should be taken if SLOs are not being met to determine the problem?
780717

781-
Not applicable.
718+
Not applicable.
782719

783720
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
784721
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
@@ -792,9 +729,6 @@ https://github.com/kubernetes/kubernetes/pull/95479
792729

793730
## Drawbacks
794731

795-
<!--
796-
Why should this KEP _not_ be implemented?
797-
-->
798732
No objections exist to implement this KEP.
799733

800734
## Alternatives

0 commit comments

Comments
 (0)