Skip to content

Commit bc1aa67

Browse files
committed
Added node observability agent scripts feature
1 parent 322a51d commit bc1aa67

File tree

4 files changed

+222
-5
lines changed

4 files changed

+222
-5
lines changed

_topic_maps/_topic_map.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2845,7 +2845,7 @@ Topics:
28452845
- Name: Workload partitioning
28462846
File: enabling-workload-partitioning
28472847
Distros: openshift-origin,openshift-enterprise
2848-
- Name: Requesting CRI-O and Kubelet profiling data by using the Node Observability Operator
2848+
- Name: Using the Node Observability Operator
28492849
File: node-observability-operator
28502850
Distros: openshift-origin,openshift-enterprise
28512851
- Name: Clusters at the network far edge
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/node-observability-operator.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="node-observability-scripting-cr_{context}"]
7+
= Creating the Node Observability custom resource for scripting
8+
9+
You must create and run the `NodeObservability` custom resource (CR) before you run the scripting. When you run the `NodeObservability` CR, it enables the agent in scripting mode on the compute nodes matching the `nodeSelector` label.
10+
11+
.Prerequisites
12+
* You have installed the Node Observability Operator.
13+
* You have installed the {oc-first}.
14+
* You have access to the cluster with `cluster-admin` privileges.
15+
16+
.Procedure
17+
18+
. Log in to the {product-title} cluster by running the following command:
19+
+
20+
[source,terminal]
21+
----
22+
$ oc login -u kubeadmin https://<host_name>:6443
23+
----
24+
25+
. Switch to the `node-observability-operator` namespace by running the following command:
26+
+
27+
[source,terminal]
28+
----
29+
$ oc project node-observability-operator
30+
----
31+
32+
. Create a file named `nodeobservability.yaml` that contains the following content:
33+
+
34+
[source,yaml]
35+
----
36+
apiVersion: nodeobservability.olm.openshift.io/v1alpha2
37+
kind: NodeObservability
38+
metadata:
39+
name: cluster <1>
40+
spec:
41+
nodeSelector:
42+
kubernetes.io/hostname: <node_hostname> <2>
43+
type: scripting <3>
44+
----
45+
<1> You must specify the name as `cluster` because there should be only one `NodeObservability` CR per cluster.
46+
<2> Specify the nodes on which the Node Observability agent must be deployed.
47+
<3> To deploy the agent in scripting mode, you must set the type to `scripting`.
48+
49+
50+
. Create the `NodeObservability` CR by running the following command:
51+
+
52+
[source,terminal]
53+
----
54+
$ oc apply -f nodeobservability.yaml
55+
----
56+
57+
+
58+
.Example output
59+
[source,terminal]
60+
----
61+
nodeobservability.olm.openshift.io/cluster created
62+
----
63+
64+
. Review the status of the `NodeObservability` CR by running the following command:
65+
+
66+
[source,terminal]
67+
----
68+
$ oc get nob/cluster -o yaml | yq '.status.conditions'
69+
----
70+
71+
+
72+
.Example output
73+
[source,terminal]
74+
----
75+
conditions:
76+
conditions:
77+
- lastTransitionTime: "2022-07-05T07:33:54Z"
78+
message: 'DaemonSet node-observability-ds ready: true NodeObservabilityScripting
79+
ready: true'
80+
reason: Ready
81+
status: "True"
82+
type: Ready
83+
----
84+
85+
+
86+
The `NodeObservability` CR run is completed when the `reason` is `Ready` and `status` is `"True"`.
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/node-observability-operator.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="node-observability-scripting_{context}"]
7+
= Configuring Node Observability Operator scripting
8+
9+
.Prerequisites
10+
11+
* You have installed the Node Observability Operator.
12+
* You have created the `NodeObservability` custom resource (CR).
13+
* You have access to the cluster with `cluster-admin` privileges.
14+
15+
.Procedure
16+
17+
. Create a file named `nodeobservabilityrun-script.yaml` that contains the following content:
18+
+
19+
[source,yaml]
20+
----
21+
apiVersion: nodeobservability.olm.openshift.io/v1alpha2
22+
kind: NodeObservabilityRun
23+
metadata:
24+
name: nodeobservabilityrun-script
25+
namespace: node-observability-operator
26+
spec:
27+
nodeObservabilityRef:
28+
name: cluster
29+
type: scripting
30+
----
31+
+
32+
[IMPORTANT]
33+
====
34+
You can request only the following scripts:
35+
36+
* `metrics.sh`
37+
* `network-metrics.sh` (uses `monitor.sh`)
38+
====
39+
40+
. Trigger the scripting by creating the `NodeObservabilityRun` resource with the following command:
41+
+
42+
[source,terminal]
43+
----
44+
$ oc apply -f nodeobservabilityrun-script.yaml
45+
----
46+
47+
. Review the status of the `NodeObservabilityRun` scripting by running the following command:
48+
+
49+
[source,terminal]
50+
----
51+
$ oc get nodeobservabilityrun nodeobservabilityrun-script -o yaml | yq '.status.conditions'
52+
----
53+
54+
+
55+
.Example output
56+
[source,terminal]
57+
----
58+
Status:
59+
Agents:
60+
Ip: 10.128.2.252
61+
Name: node-observability-agent-n2fpm
62+
Port: 8443
63+
Ip: 10.131.0.186
64+
Name: node-observability-agent-wcc8p
65+
Port: 8443
66+
Conditions:
67+
Conditions:
68+
Last Transition Time: 2023-12-19T15:10:51Z
69+
Message: Ready to start profiling
70+
Reason: Ready
71+
Status: True
72+
Type: Ready
73+
Last Transition Time: 2023-12-19T15:11:01Z
74+
Message: Profiling query done
75+
Reason: Finished
76+
Status: True
77+
Type: Finished
78+
Finished Timestamp: 2023-12-19T15:11:01Z
79+
Start Timestamp: 2023-12-19T15:10:51Z
80+
----
81+
82+
+
83+
The scripting is complete once `Status` is `True` and `Type` is `Finished`.
84+
85+
. Retrieve the scripting data from the root path of the container by running the following bash script:
86+
+
87+
[source,bash]
88+
----
89+
#!/bin/bash
90+
91+
RUN=$(oc get nodeobservabilityrun --no-headers | awk '{print $1}')
92+
93+
for a in $(oc get nodeobservabilityruns.nodeobservability.olm.openshift.io/${RUN} -o json | jq .status.agents[].name); do
94+
echo "agent ${a}"
95+
agent=$(echo ${a} | tr -d "\"\'\`")
96+
base_dir=$(oc exec "${agent}" -c node-observability-agent -- bash -c "ls -t | grep node-observability-agent" | head -1)
97+
echo "${base_dir}"
98+
mkdir -p "/tmp/${agent}"
99+
for p in $(oc exec "${agent}" -c node-observability-agent -- bash -c "ls ${base_dir}"); do
100+
f="/${base_dir}/${p}"
101+
echo "copying ${f} to /tmp/${agent}/${p}"
102+
oc exec "${agent}" -c node-observability-agent -- cat ${f} > "/tmp/${agent}/${p}"
103+
done
104+
done
105+
----

scalability_and_performance/node-observability-operator.adoc

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
:_mod-docs-content-type: ASSEMBLY
22
[id="using-node-observability-operator"]
3-
= Requesting CRI-O and Kubelet profiling data by using the Node Observability Operator
3+
= Using the Node Observability Operator
44
include::_attributes/common-attributes.adoc[]
55
:context: node-observability-operator
66

77
toc::[]
88

9+
The Node Observability Operator collects and stores CRI-O and Kubelet profiling or metrics from scripts of compute nodes.
10+
11+
With the Node Observability Operator, you can query the profiling data, enabling analysis of performance trends in CRI-O and Kubelet. It supports debugging performance-related issues and executing embedded scripts for network metrics by using the `run` field in the custom resource definition. To enable CRI-O and Kubelet profiling or scripting, you can configure the `type` field in the custom resource definition.
912

10-
The Node Observability Operator collects and stores the CRI-O and Kubelet profiling data of worker nodes. You can query the profiling data to analyze the CRI-O and Kubelet performance trends and debug the performance-related issues.
1113

1214
:FeatureName: The Node Observability Operator
1315
include::snippets/technology-preview.adoc[leveloffset=+0]
@@ -20,6 +22,30 @@ include::modules/node-observability-install-cli.adoc[leveloffset=+2]
2022

2123
include::modules/node-observability-install-web-console.adoc[leveloffset=+2]
2224

23-
include::modules/node-observability-create-custom-resource.adoc[leveloffset=+1]
2425

25-
include::modules/node-observability-run-profiling-query.adoc[leveloffset=+1]
26+
[id="requesting-crio-kubelet-profiling-using-noo_{context}"]
27+
== Requesting CRI-O and Kubelet profiling data using the Node Observability Operator
28+
29+
Creating a Node Observability custom resource to collect CRI-O and Kubelet profiling data.
30+
31+
include::modules/node-observability-create-custom-resource.adoc[leveloffset=+2]
32+
33+
include::modules/node-observability-run-profiling-query.adoc[leveloffset=+2]
34+
35+
36+
[id="node-observability-operator-scripting_{context}"]
37+
== Node Observability Operator scripting
38+
39+
Scripting allows you to run pre-configured bash scripts, using the current Node Observability Operator and Node Observability Agent.
40+
41+
These scripts monitor key metrics like CPU load, memory pressure, and worker node issues. They also collect sar reports and custom performance metrics.
42+
43+
include::modules/node-observability-scripting-cr.adoc[leveloffset=+2]
44+
45+
include::modules/node-observability-scripting.adoc[leveloffset=+2]
46+
47+
[role="_additional-resources"]
48+
[id="additional-resources_node-observability-operator"]
49+
== Additional resources
50+
51+
For more information on how to collect worker metrics, see link:https://access.redhat.com/solutions/5343671[Red Hat Knowledgebase article].

0 commit comments

Comments
 (0)