Skip to content

Commit 82138a4

Browse files
committed
node: topologymgr: Add metric to measure latency
We need to determine the latency this feature adds due to the resource alignment logic executed at pod admission time. Since such a metric does not exist, a new metric: `topology_manager_admission_duration_seconds` would be added in the dev phase. Signed-off-by: Swati Sehgal <[email protected]>
1 parent f821bbc commit 82138a4

File tree

2 files changed

+8
-5
lines changed

2 files changed

+8
-5
lines changed

keps/sig-node/693-topology-manager/README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -797,30 +797,32 @@ Monitor the following metrics:
797797

798798
"topology_manager_admission_requests_total"
799799
"topology_manager_admission_errors_total"
800+
"topology_manager_admission_duration_seconds"
800801

801802
###### How can an operator determine if the feature is in use by workloads?
802803

803-
The operator can look at `topology_manager_admission_requests_total` and `topology_manager_admission_errors_total`
804-
metrics to determine if topology manager is performing its admission check.
804+
The operator can look at `topology_manager_admission_requests_total`, `topology_manager_admission_errors_total` and
805+
`topology_manager_admission_duration_seconds` metrics to determine if topology manager is performing its admission check.
805806
In addition to that, kubelet configuration of the nodes can be inspected to check feature gates and the policies
806807
configured.
807808

808809
###### How can someone using this feature know that it is working for their instance?
809810

810811
- [X] Other (treat as last resort)
811-
- Details: check the kubelet metric `topology_manager_admission_requests_total`
812+
- Details: check the kubelet metric `topology_manager_admission_requests_total` or "topology_manager_admission_duration_seconds"
812813

813814
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
814815

815-
"topology_manager_admission_requests_total" can be used to determine if topology manager is
816-
performing its admission check.
816+
"topology_manager_admission_duration_seconds" (which will be added as this release) can be used to determine
817+
if the resource alignment logic performed at pod admission time is taking longer than expected.
817818

818819
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
819820

820821
- [X] Metrics
821822
- Metric name:
822823
- topology_manager_admission_requests_total
823824
- topology_manager_admission_errors_total
825+
- topology_manager_admission_duration_seconds
824826

825827
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
826828

keps/sig-node/693-topology-manager/kep.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,4 @@ disable-supported: true
5252
metrics:
5353
- topology_manager_admission_requests_total
5454
- topology_manager_admission_errors_total
55+
- topology_manager_admission_duration_seconds

0 commit comments

Comments
 (0)