Skip to content

Commit e1d1af1

Browse files
committed
node: cpumgr: add metrics information
During the review of the KEP, it emerged there are possible metrics we should add, tracking admission and errors. CPU allocation is done at admission time, and extracting these metrics is expected to be both cheap and useful for monitoring. Signed-off-by: Francesco Romani <[email protected]>
1 parent b57f64b commit e1d1af1

File tree

2 files changed

+49
-18
lines changed

2 files changed

+49
-18
lines changed

keps/sig-node/3570-cpumanager/README.md

Lines changed: 47 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,17 @@ N/A
164164

165165
### Risks and Mitigations
166166

167-
TBD
167+
Bugs in cpumanager can cause the kubelet to crash, or workloads to start with incorrect pinning.
168+
This can be mitigated with comprehensive testing and improving the observability of the system
169+
(see metrics).
170+
171+
While the cpumanager core policy has seen no changes except for bugfixes since a while,
172+
we introduced the [cpumanager options policy framework](https://github.com/fromanirh/enhancements/blob/master/keps/sig-node/2625-cpumanager-policies-thread-placement/README.md)
173+
to enable the fine tuning of the static policy.
174+
This area is more active, so bugs introduced with policy options can cause the kubelet to crash.
175+
To mitigate this risk, we can make sure each policy option can be disabled independently, and
176+
is not coupled with others, avoiding cascading failures or unnecessary coupling.
177+
Graduation and testing criteria are deferred to the KEPs tracking the implementation of these features.
168178

169179
## Design Details
170180

@@ -530,7 +540,11 @@ Already running workload will not be affected if the node state is steady
530540

531541
###### What specific metrics should inform a rollback?
532542

533-
Pod creation errors on a node-by-node basis.
543+
"cpu_manager_pinning_errors_total". It must be noted that even in fully healthy system there are known benign condition
544+
that can cause CPU allocation failures. Few selected examples are:
545+
546+
- requesting odd numbered cores (not a full physical core) when the cpumanager is configured with the `full-pcpus-only` option
547+
- requesting NUMA-aligned cores, with Topology Manager enabled.
534548

535549
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
536550

@@ -545,37 +559,53 @@ No
545559

546560
### Monitoring Requirements
547561

548-
Monitor the pod admission counter
549-
Monitor the pods not going running after successful schedule
562+
Monitor the metrics
563+
- "cpu_manager_pinning_requests_total"
564+
- "cpu_manager_pinning_errors_total"
550565

551566
###### How can an operator determine if the feature is in use by workloads?
552567

553-
The operator need to inspect the node and verify the cpu pinning assignment either checking the cgroups on the node
554-
or accessing the podresources API of the kubelet.
568+
In order for pods to request exclusive CPUs allocation and pinning, they need to match
569+
all the following criteria:
570+
- the pod QoS must be "guaranteed"
571+
- the resources request of CPU (`pod.spec.containers[].resources.limits.cpu`) must be integral.
555572

556-
###### How can someone using this feature know that it is working for their instance?
573+
On top of that, at kubelet level
574+
- the cpumanager policy must be `static`.
557575

576+
If all the criteria are met, then the feature is in use by workloads.
577+
578+
###### How can someone using this feature know that it is working for their instance?
558579

559580
- [X] Other (treat as last resort)
560-
- Details: the containers need to check the cpu set they are allowed to run; in addition, node agents (e.g. node_exporter)
561-
can report the CPU assignment
581+
- Details: check the kubelet metric `cpu_manager_pinning_requests_total`
562582

563583
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
564584

565-
- N/A
585+
"cpu_manager_pinning_requests_total" and "cpu_manager_pinning_errors_total"
586+
We need to find a careful balance here because we don't want to leak hardware details, or in general informations
587+
dependent on the worker node hardware configuration (example, even if arguable extreme, is the processor core layout).
588+
589+
It is possible to infer which pod would trigger a CPU pinning from the
590+
[pod resources request](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy)
591+
but adding these two metrics is both very cheap and helping for the observability of the system.
566592

567593
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
568594

569-
- [ ] Other (treat as last resort)
570-
- Details:
571-
a operator should check that pods go running correctly and the cpu pinning is performed. The latter can
572-
be checked by inspecting the cgroups at node level.
595+
- [X] Metrics
596+
- Metric name:
597+
- cpu_manager_pinning_requests_total
598+
- cpu_manager_pinning_errors_total
573599

574600
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
575601

576-
No, because all the metrics we were aware of leaked hardware details.
577-
All of the metrics experimented by consumers of the feature so far require to expose hardware details of the
578-
worker nodes, and are dependent on the worker node hardware configuration (e.g. processor core layout).
602+
- "cpu_manager_pinning_requests_total"
603+
- "cpu_manager_pinning_errors_total"
604+
605+
The addition of these metrics will be done before moving to GA
606+
([issue](https://github.com/kubernetes/kubernetes/issues/112854),
607+
[PR](https://github.com/kubernetes/kubernetes/pull/112855)).
608+
579609

580610
### Dependencies
581611

keps/sig-node/3570-cpumanager/kep.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,5 @@ disable-supported: true
4747

4848
# The following PRR answers are required at beta release
4949
metrics:
50-
- N/A
50+
- cpu_manager_pinning_requests_total
51+
- cpu_manager_pinning_errors_total

0 commit comments

Comments
 (0)