@@ -635,9 +635,9 @@ _This section must be completed when targeting beta graduation to a release._
635
635
636
636
637
637
###### What specific metrics should inform a rollback?
638
-
639
638
The pod may fail with the admission error because the kubelet can not provide all resources aligned from the same NUMA node.
640
639
You can see the error message under the pod events.
640
+ "memory_manager_assignment_errors_total".
641
641
642
642
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
643
643
Tested it manually by replacing the kubelet binary on the node with the ` Static ` memory manager policy, but I failed
@@ -650,13 +650,17 @@ fields of API types, flags, etc.?**
650
650
651
651
### Monitoring Requirements
652
652
653
- _ This section must be completed when targeting beta graduation to a release._
653
+ Monitor the metrics
654
+ - "memory_manager_pinning_requests_total"
655
+ - "memory_manager_assignment_errors_total"
654
656
655
657
###### How can an operator determine if the feature is in use by workloads?
656
658
In order for workloads to request exclusive memory allocation, the pod QoS must be "guaranteed".
657
659
The memory manager data will be available under pod resources API.
658
660
When it is configured with the static policy,
659
661
you will see memory related data during call to the pod resources API List method under the container.
662
+ In addition, in case the workload is guaranteed, the metric named memory_manager_pinning_requests_total should
663
+ be incremented.
660
664
661
665
###### How can someone using this feature know that it is working for their instance?
662
666
@@ -683,20 +687,23 @@ _This section must be completed when targeting beta graduation to a release._
683
687
<!--
684
688
Pick one more of these and delete the rest.
685
689
-->
686
- ` <TODO> `
687
- - [ ] Metrics
690
+
691
+ - [X ] Metrics
688
692
- Metric name:
689
- - [ Optional] Aggregation method:
690
- - Components exposing the metric:
691
- - [ ] Other (treat as last resort)
692
- - Details:
693
+ - memory_manager_pinning_requests_total
694
+ - memory_manager_assignment_errors_total
693
695
694
696
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
695
697
<!--
696
698
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
697
699
implementation difficulties, etc.).
698
700
-->
699
- Currently, for the pod author, it is impossible to know containers NUMA pinning without access to the node.
701
+ - "memory_manager_pinning_requests_total"
702
+ - "memory_manager_assignment_errors_total"
703
+
704
+ The addition of these metrics will be done before moving to GA
705
+ issue - ` <TBD> `
706
+ PR - ` <TBD> `
700
707
701
708
### Dependencies
702
709
0 commit comments