Skip to content

Commit 7fa76e2

Browse files
committed
TELCODOCS-386: Send SRO metrics as telemetry data
1 parent bb20245 commit 7fa76e2

File tree

4 files changed

+57
-11
lines changed

4 files changed

+57
-11
lines changed

hardware_enablement/psap-special-resource-operator.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ include::modules/psap-special-resource-operator-using-manifests.adoc[leveloffset
2929

3030
include::modules/psap-special-resource-operator-using-configmaps.adoc[leveloffset=+2]
3131

32+
33+
include::modules/psap-special-resource-operator-metrics.adoc[leveloffset=+1]
34+
3235
[id="additional-resources_special-resource-operator"]
3336
== Additional resources
3437

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hardware_enablement/psap-special-resource-operator.adoc
4+
5+
:_content-type: REFERENCE
6+
[id="special-resource-operator-metrics_{context}"]
7+
= Prometheus Special Resource Operator metrics
8+
9+
10+
The Special Resource Operator (SRO) exposes the following Prometheus metrics through the `metrics` service:
11+
12+
|===
13+
|Metric Name |Description
14+
15+
|`sro_used_nodes`
16+
|Returns the nodes that are running pods created by a SRO custom resource (CR). This metric is available for `DaemonSet` and `Deployment` objects only.
17+
18+
|`sro_kind_completed_info`
19+
|Represents whether a `kind` of an object defined by the Helm Charts in a SRO CR has been successfully uploaded in the cluster (value `1`) or not (value `0`). Examples of objects are `DaemonSet`, `Deployment` or `BuildConfig`.
20+
21+
|`sro_states_completed_info`
22+
|Represents whether the SRO has finished processing a CR successfully (value `1`) or the SRO has not processed the CR yet (value `0`).
23+
24+
|`sro_managed_resources_total`
25+
|Returns the number of SRO CRs in the cluster, regardless of their state.
26+
|===

modules/psap-special-resource-operator-using-configmaps.adoc

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22
//
33
// * hardware_enablement/psap-special-resource-operator.adoc
44

5+
:_content-type: PROCEDURE
56
[id="deploy-simple-kmod-using-configmap-chart"]
67
= Building and running the simple-kmod SpecialResource by using a config map
78

8-
In this example, the simple-kmod kernel module is used to show how the SRO can manage a driver container which is defined in Helm chart templates stored in a config map.
9+
In this example, the simple-kmod kernel module shows how the Special Resource Operator (SRO) manages a driver container. The container is defined in the Helm chart templates that are stored in a config map.
910

1011
.Prerequisites
1112

@@ -14,7 +15,7 @@ In this example, the simple-kmod kernel module is used to show how the SRO can m
1415
* You installed the OpenShift CLI (`oc`).
1516
* You are logged into the OpenShift CLI as a user with `cluster-admin` privileges.
1617
* You installed the Node Feature Discovery (NFD) Operator.
17-
* You installed the Special Resource Operator.
18+
* You installed the SRO.
1819
* You installed the Helm CLI (`helm`).
1920
2021
.Procedure
@@ -270,7 +271,15 @@ spec:
270271
----
271272
$ oc create -f simple-kmod-configmap.yaml
272273
----
273-
+
274+
275+
[NOTE]
276+
====
277+
To remove the simple-kmod kernel module from the node, delete the simple-kmod `SpecialResource` API object using the `oc delete` command. The kernel module is unloaded when the driver container pod is deleted.
278+
====
279+
280+
281+
.Verification
282+
274283
The `simple-kmod` resources are deployed in the `simple-kmod` namespace as specified in the object manifest. After a short time, the build pod for the `simple-kmod` driver container starts running. The build completes after a few minutes, and then the driver container pods start running.
275284

276285
. Use `oc get pods` command to display the status of the build pods:
@@ -310,7 +319,7 @@ simple_procfs_kmod 16384 0
310319
simple_kmod 16384 0
311320
----
312321

313-
[NOTE]
314-
====
315-
If you want to remove the simple-kmod kernel module from the node, delete the simple-kmod `SpecialResource` API object using the `oc delete` command. The kernel module is unloaded when the driver container pod is deleted.
322+
[TIP]
316323
====
324+
The `sro_kind_completed_info` SRO Prometheus metric provides information about the status of the different objects being deployed, which can be useful to troubleshoot SRO CR installations. The SRO also provides other types of metrics that you can use to watch the health of your environment.
325+
====

modules/psap-special-resource-operator-using-manifests.adoc

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22
//
33
// * hardware_enablement/psap-special-resource-operator.adoc
44

5+
:_content-type: PROCEDURE
56
[id="deploy-simple-kmod-using-local-chart_{context}"]
67
= Building and running the simple-kmod SpecialResource by using the templates from the SRO image
78

8-
The SRO image contains a local repository of Helm charts including the templates for deploying the simple-kmod kernel module. In this example, the simple-kmod kernel module is used to show how the SRO can manage a driver container that is defined in the internal SRO repository.
9+
The Special Resource Operator(SRO) image contains a local repository of Helm charts, including the templates for deploying the simple-kmod kernel module. In this example, the simple-kmod kernel module shows how the SRO can manage a driver container that is defined in the internal SRO repository.
910

1011
.Prerequisites
1112

@@ -53,10 +54,17 @@ spec:
5354
----
5455
$ oc create -f simple-kmod-local.yaml
5556
----
56-
+
57+
58+
[NOTE]
59+
====
60+
To remove the simple-kmod kernel module from the node, delete the simple-kmod `SpecialResource` API object using the `oc delete` command. The kernel module is unloaded when the driver container pod is deleted.
61+
====
62+
63+
.Verification
64+
65+
5766
The `simple-kmod` resources are deployed in the `simple-kmod` namespace as specified in the object manifest. After a short time, the build pod for the `simple-kmod` driver container starts running. The build completes after a few minutes, and then the driver container pods start running.
5867

59-
+
6068
. Use the `oc get pods` command to display the status of the pods:
6169

6270
+
@@ -95,7 +103,7 @@ simple_procfs_kmod 16384 0
95103
simple_kmod 16384 0
96104
----
97105

98-
[NOTE]
106+
[TIP]
99107
====
100-
If you want to remove the simple-kmod kernel module from the node, delete the simple-kmod `SpecialResource` API object using the `oc delete` command. The kernel module is unloaded when the driver container pod is deleted.
108+
The `sro_kind_completed_info` SRO Prometheus metric provides information about the status of the different objects being deployed, which can be useful to troubleshoot SRO CR installations. The SRO also provides other types of metrics that you can use to watch the health of your environment.
101109
====

0 commit comments

Comments
 (0)