Skip to content

Commit b4e967b

Browse files
ffromaniTal-or
authored andcommitted
node: memgr: clarify SLO and feature feedback
Further clarifications about how pod authors can check (or not) if the feature is working for them. Signed-off-by: Francesco Romani <[email protected]>
1 parent d13b348 commit b4e967b

File tree

1 file changed

+14
-10
lines changed

1 file changed

+14
-10
lines changed

keps/sig-node/1769-memory-manager/README.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -617,24 +617,28 @@ In addition, in case the workload is guaranteed, the metric named `memory_manage
617617

618618
###### How can someone using this feature know that it is working for their instance?
619619

620-
*For cluster admins:*
620+
*For cluster admins:*
621621

622-
The assumption that the feature should work, once memory manager static policy and reserved memory flags configured under the kubelet
623-
and kubelet succeeded to restart.
622+
The assumption that the feature should work, once memory manager static policy and reserved memory flags configured under the kubelet
623+
and kubelet succeeded to restart. Entities (daemonsets or operators) which can access the nodes have two options to verify if
624+
containers are pinned to the NUMA node:
625+
- Via pod resources API, you will need to connect to grpc socket and get information from it, see [pod resource API doc page](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more information.
626+
- Checking the relevant container CGroup under the node.
624627

625-
*For a pod author:*
628+
*For a pod author:*
626629

627-
* Pod succeeded to start. You have two options to verify if containers are pinned to the NUMA node
628-
- Via pod resources API, you will need to connect to grpc socket and get information from it, see [pod resource API doc page](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more information.
629-
- Checking the relevant container CGroup under the node.
630+
* Pod succeeded to start. If a pod requiring memory pinning is admitted is implied that resources are allocated correctly and the workload
631+
must verify the allocation by itself, for example reimplementing the check that the allocated memory areas area all on the same NUMA zones,
632+
usually through sysfs.
630633

631-
* Pod failed to start because of the admission error.
634+
* Pod failed to start because of the admission error.
632635

633-
To understand the reason, you will need to check via pod resources API the amount of allocatable memory and memory reserved by containers.
636+
To understand the reason, you will need to check via pod resources API the amount of allocatable memory and memory reserved by containers.
634637

635638
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
636639

637-
This does not seem relevant to this feature.
640+
For each node, the value of the metric `memory_manager_pinning_requests_total` is expected to match the number of admitted pods which require memory pinning.
641+
For each node, the value of the metric `memory_manager_pinning_errors_total` is expected to be zero.
638642

639643
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
640644

0 commit comments

Comments
 (0)