Skip to content

Commit ea1db52

Browse files
authored
Merge pull request #47075 from darshan-nagaraj/CFE-254
CFE-254: Adds Node observability Operator section
2 parents 0a41956 + 0409f37 commit ea1db52

8 files changed

+379
-2
lines changed

_topic_maps/_topic_map.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1229,7 +1229,7 @@ Topics:
12291229
File: about-advertising-ipaddresspool
12301230
- Name: Configuring MetalLB BGP peers
12311231
File: metallb-configure-bgp-peers
1232-
- Name: Advertising an IP address pool using the community alias
1232+
- Name: Advertising an IP address pool using the community alias
12331233
File: metallb-configure-community-alias
12341234
- Name: Configuring MetalLB BFD profiles
12351235
File: metallb-configure-bfd-profiles
@@ -2046,7 +2046,7 @@ Topics:
20462046
- Name: Enabling features using FeatureGates
20472047
File: nodes-cluster-enabling-features
20482048
Distros: openshift-enterprise,openshift-origin
2049-
- Name: Improving cluster stability in high latency environments using worker latency profiles
2049+
- Name: Improving cluster stability in high latency environments using worker latency profiles
20502050
File: nodes-cluster-worker-latency-profiles
20512051
Distros: openshift-enterprise,openshift-origin
20522052
- Name: Remote worker nodes on the network edge
@@ -2270,6 +2270,9 @@ Topics:
22702270
- Name: Deploying distributed units at scale in a disconnected environment
22712271
File: ztp-deploying-disconnected
22722272
Distros: openshift-origin,openshift-enterprise
2273+
- Name: Requesting CRI-O and Kubelet profiling data using the Node Observability Operator
2274+
File: node-observability-operator
2275+
Distros: openshift-origin,openshift-enterprise
22732276
---
22742277
Name: Specialized hardware and driver enablement
22752278
Dir: hardware_enablement
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/understanding-node-observability-operator.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="creating-node-observability-custom-resource_{context}"]
7+
= Creating the Node Observability custom resource
8+
9+
Before you run profiling queries, you must create a `NodeObservability` custom resource (CR).
10+
11+
[IMPORTANT]
12+
====
13+
Creating a `NodeObservability` CR reboots all the worker nodes. It might take 10 or more minutes to complete.
14+
====
15+
16+
When you apply the `NodeObservability` CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes.
17+
18+
[NOTE]
19+
====
20+
Kubelet profiling is enabled by default.
21+
====
22+
23+
The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRIO to run the pprof request. Similiarly, the `kubelet-serving-ca` certificate chain is mounted on the agent pod, which allows secure communication between the agent and node's kubelet endpoint.
24+
25+
.Prerequisites
26+
* You have installed the Node Observability Operator.
27+
* You have installed the OpenShift CLI (oc).
28+
* You have access to the cluster with `cluster-admin` privileges.
29+
30+
.Procedure
31+
32+
. Log in to the {product-title} CLI as a user with the `cluster-admin` role by running the following command:
33+
+
34+
[source,terminal]
35+
----
36+
$ oc login -u kubeadmin https://<HOSTNAME>:6443
37+
----
38+
39+
. Switch back to the `node-observability-operator` namespace by running the following command:
40+
+
41+
[source,terminal]
42+
----
43+
$ oc project node-observability-operator
44+
----
45+
46+
. Create a CR file named `nodeobservability.yaml` that contains the following text:
47+
+
48+
[source,yaml]
49+
----
50+
apiVersion: nodeobservability.olm.openshift.io/v1alpha1
51+
kind: NodeObservability
52+
metadata:
53+
name: cluster <1>
54+
spec:
55+
labels:
56+
node-role.kubernetes.io/worker: ""
57+
type: crio-kubelet
58+
----
59+
<1> You must specify the name as `cluster` because there should be only one `NodeObservability` CR per cluster.
60+
61+
. Run the `NodeObservability` CR:
62+
+
63+
[source,terminal]
64+
----
65+
oc apply -f nodeobservability.yaml
66+
----
67+
68+
+
69+
.Example output
70+
[source,terminal]
71+
----
72+
nodeobservability.olm.openshift.io/cluster created
73+
----
74+
75+
. Review the status of the `NodeObservability` CR by running the following command:
76+
+
77+
[source,terminal]
78+
----
79+
$ oc get nob/cluster -o yaml | yq '.status.conditions'
80+
----
81+
82+
+
83+
.Example output
84+
[source,terminal]
85+
----
86+
conditions:
87+
conditions:
88+
- lastTransitionTime: "2022-07-05T07:33:54Z"
89+
message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig
90+
ready: true'
91+
reason: Ready
92+
status: "True"
93+
type: Ready
94+
----
95+
96+
+
97+
`NodeObservability` CR run is completed when the reason is `Ready` and the status is `True`.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/understanding-node-observability-operator.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="workflow-node-observability-operator_{context}"]
7+
= High level workflow of the Node Observability Operator
8+
9+
After you install the Node Observability Operator in the {product-title} cluster, you have to create a `NodeObservability` custom resource, which creates a DaemonSet to deploy a Node Observability agent on each worker node.
10+
11+
To request a profiling query, you have to create a `NodeObservabilityRun` resource that requests the deployed Node Observability agent to trigger the CRI-O and Kubelet profiling. After the profiling is completed, the Node Observability agent stores the profiling data inside the container file system `/run/node-observability` directory, which is available for query.
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/understanding-node-observability-operator.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="install-node-observability-using-cli_{context}"]
7+
= Installing the Node Observability Operator using the CLI
8+
9+
You can install the Node Observability Operator by using the OpenShift CLI (oc).
10+
11+
.Prerequisites
12+
13+
* You have installed the OpenShift CLI (oc).
14+
* You have access to the cluster with `cluster-admin` privileges.
15+
16+
.Procedure
17+
18+
. Confirm that the Node Observability Operator is available by running the following command:
19+
+
20+
[source,terminal]
21+
----
22+
$ oc get packagemanifests -n openshift-marketplace node-observability-operator
23+
----
24+
25+
+
26+
.Example output
27+
[source,terminal]
28+
----
29+
NAME CATALOG AGE
30+
node-observability-operator Red Hat Operators 9h
31+
----
32+
33+
. Create the `node-observability-operator` namespace by running the following command::
34+
+
35+
[source,terminal]
36+
----
37+
$ oc new-project node-observability-operator
38+
----
39+
40+
. Create an `OperatorGroup` object YAML file:
41+
+
42+
[source,yaml]
43+
----
44+
cat <<EOF | oc apply -f -
45+
apiVersion: operators.coreos.com/v1
46+
kind: OperatorGroup
47+
metadata:
48+
name: node-observability-operator
49+
namespace: node-observability-operator
50+
spec:
51+
targetNamespaces:
52+
- node-observability-operator
53+
EOF
54+
----
55+
56+
. Create a `Subscription` object YAML file to subscribe a namespace to an Operator:
57+
+
58+
[source,yaml]
59+
----
60+
cat <<EOF | oc apply -f -
61+
apiVersion: operators.coreos.com/v1alpha1
62+
kind: Subscription
63+
metadata:
64+
name: node-observability-operator
65+
namespace: node-observability-operator
66+
spec:
67+
channel: alpha
68+
name: node-observability-operator
69+
source: redhat-operators
70+
sourceNamespace: openshift-marketplace
71+
EOF
72+
----
73+
74+
.Verification
75+
76+
. View the install plan name by running the following command:
77+
+
78+
[source,terminal]
79+
----
80+
$ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'
81+
----
82+
83+
+
84+
.Example output
85+
[source,terminal]
86+
----
87+
install-dt54w
88+
----
89+
90+
. Verify the install plan status by running the following command:
91+
+
92+
[source,terminal]
93+
----
94+
$ oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
95+
----
96+
+
97+
`<install_plan_name>` is the install plan name that you obtained from the output of the previous command.
98+
99+
+
100+
.Example output
101+
[source,terminal]
102+
----
103+
COMPLETE
104+
----
105+
106+
. Verify that the Node Observability Operator is up and running:
107+
+
108+
[source,terminal]
109+
----
110+
$ oc get deploy -n node-observability-operator
111+
----
112+
113+
+
114+
.Example output
115+
[source,terminal]
116+
----
117+
NAME READY UP-TO-DATE AVAILABLE AGE
118+
node-observability-operator-controller-manager 1/1 1 1 40h
119+
----
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/understanding-node-observability-operator.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="install-node-observability-using-web-console_{context}"]
7+
= Installing the Node Observability Operator using the web console
8+
9+
You can install the Node Observability Operator from the {product-title} web console.
10+
11+
.Prerequisites
12+
13+
* You have access to the cluster with `cluster-admin` privileges.
14+
* You have access to the {product-title} web console.
15+
16+
.Procedure
17+
18+
. Log in to the {product-title} web console.
19+
. In the Administrator's navigation panel, expand *Operators**OperatorHub*.
20+
. In the *All items* field, enter *Node Observability Operator* and select the *Node Observability Operator* tile.
21+
. Click *Install*.
22+
. On the *Install Operator* page, configure the following settings:
23+
.. In the *Update channel* area, click *alpha*.
24+
.. In the *Installation mode* area, click *A specific namespace on the cluster*.
25+
.. From the *Installed Namespace* list, select *node-observability-operator* from the list.
26+
.. In the *Update approval* area, select *Automatic*.
27+
.. Click *Install*.
28+
29+
.Verification
30+
. In the Administrator's navigation panel, expand *Operators**Installed Operators*.
31+
. Verify that the Node Observability Operator is listed in the Operators list.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/understanding-node-observability-operator.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="install-node-observability-operator_{context}"]
7+
= Installing the Node Observability Operator
8+
The Node Observability Operator is not installed in {product-title} by default. You can install the Node Observability Operator by using the {product-title} CLI or the web console.
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/understanding-node-observability-operator.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="running-profiling-query_{context}"]
7+
= Running profiling query
8+
9+
Profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. The Node Observability Operator stores the profiling data inside the container file system `/run/node-observability` directory. To request profiling data query, you have to create a `NodeObservabilityRun` resource.
10+
11+
[IMPORTANT]
12+
====
13+
You can request only one profiling query at any point of time.
14+
====
15+
16+
.Prerequisites
17+
* You have installed the Node Observability Operator.
18+
* You have created the `NodeObservability` custom resource (CR).
19+
* You have access to the cluster with `cluster-admin` privileges.
20+
21+
.Procedure
22+
23+
. Create a `NodeObservabilityRun` resource file named `nodeobservabilityrun.yaml` that contains the following text:
24+
+
25+
[source,yaml]
26+
----
27+
apiVersion: nodeobservability.olm.openshift.io/v1alpha1
28+
kind: NodeObservabilityRun
29+
metadata:
30+
name: nodeobservabilityrun
31+
spec:
32+
nodeObservabilityRef:
33+
name: cluster
34+
----
35+
36+
. Run the `NodeObservabilityRun` to trigger the profiling:
37+
+
38+
[source,terminal]
39+
----
40+
$ oc apply -f nodeobservabilityrun.yaml
41+
----
42+
43+
. Review the status of the `NodeObservabilityRun` by running the following command:
44+
+
45+
[source,terminal]
46+
----
47+
$ oc get nodeobservabilityrun -o yaml | yq '.status.conditions'
48+
----
49+
50+
+
51+
.Example output
52+
[source,terminal]
53+
----
54+
conditions:
55+
- lastTransitionTime: "2022-07-07T14:57:34Z"
56+
message: Ready to start profiling
57+
reason: Ready
58+
status: "True"
59+
type: Ready
60+
- lastTransitionTime: "2022-07-07T14:58:10Z"
61+
message: Profiling query done
62+
reason: Finished
63+
status: "True"
64+
type: Finished
65+
----
66+
67+
+
68+
Profiling query is complete when the status is `True` and type is `Finished`.
69+
70+
. Run the following bash script to retrieve the profiling data from container's `/run/node-observability` path:
71+
+
72+
[source,bash]
73+
----
74+
for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do
75+
echo "agent ${a}"
76+
mkdir -p "/tmp/${a}"
77+
for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do
78+
f="$(basename ${p})"
79+
echo "copying ${f} to /tmp/${a}/${f}"
80+
oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}"
81+
done
82+
done
83+
----

0 commit comments

Comments
 (0)