You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Further clarifications about how pod authors can
check (or not) if the feature is working for them.
Signed-off-by: Francesco Romani <[email protected]>
Copy file name to clipboardExpand all lines: keps/sig-node/1769-memory-manager/README.md
+14-10Lines changed: 14 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -617,24 +617,28 @@ In addition, in case the workload is guaranteed, the metric named `memory_manage
617
617
618
618
###### How can someone using this feature know that it is working for their instance?
619
619
620
-
*For cluster admins:*
620
+
*For cluster admins:*
621
621
622
-
The assumption that the feature should work, once memory manager static policy and reserved memory flags configured under the kubelet
623
-
and kubelet succeeded to restart.
622
+
The assumption that the feature should work, once memory manager static policy and reserved memory flags configured under the kubelet
623
+
and kubelet succeeded to restart. Entities (daemonsets or operators) which can access the nodes have two options to verify if
624
+
containers are pinned to the NUMA node:
625
+
- Via pod resources API, you will need to connect to grpc socket and get information from it, see [pod resource API doc page](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more information.
626
+
- Checking the relevant container CGroup under the node.
624
627
625
-
*For a pod author:*
628
+
*For a pod author:*
626
629
627
-
* Pod succeeded to start. You have two options to verify if containers are pinned to the NUMA node
628
-
- Via pod resources API, you will need to connect to grpc socket and get information from it, see [pod resource API doc page](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more information.
629
-
- Checking the relevant container CGroup under the node.
630
+
* Pod succeeded to start. If a pod requiring memory pinning is admitted is implied that resources are allocated correctly and the workload
631
+
must verify the allocation by itself, for example reimplementing the check that the allocated memory areas area all on the same NUMA zones,
632
+
usually through sysfs.
630
633
631
-
* Pod failed to start because of the admission error.
634
+
* Pod failed to start because of the admission error.
632
635
633
-
To understand the reason, you will need to check via pod resources API the amount of allocatable memory and memory reserved by containers.
636
+
To understand the reason, you will need to check via pod resources API the amount of allocatable memory and memory reserved by containers.
634
637
635
638
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
636
639
637
-
This does not seem relevant to this feature.
640
+
For each node, the value of the metric `memory_manager_pinning_requests_total` is expected to match the number of admitted pods which require memory pinning.
641
+
For each node, the value of the metric `memory_manager_pinning_errors_total` is expected to be zero.
638
642
639
643
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
0 commit comments