Skip to content

Commit cc833e9

Browse files
rwsuAndrea Fasano
andcommitted
AGENT-863: Script to run monitor-add-nodes in cluster
Derived from a similar script by Andrea Fasano to generate the add-nodes ISO. #8242 This script tweaks it and creates a node-joiner-monitor pod to monitor adding nodes to a cluster. Co-authored-by: Andrea Fasano <[email protected]>
1 parent 1a3d6c2 commit cc833e9

File tree

2 files changed

+173
-4
lines changed

2 files changed

+173
-4
lines changed

docs/user/agent/add-node/add-nodes.md

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ hosts:
6464
macAddress: 00:02:46:e3:9e:9c
6565
6666
## ISO generation
67-
Run the [node-joiner.sh](./node-joiner.sh):
67+
Run [node-joiner.sh](./node-joiner.sh):
6868
```bash
6969
$ ./node-joiner.sh
7070
```
@@ -84,11 +84,12 @@ $ ./node-joiner.sh config.yaml
8484
Use the iso image to boot all the nodes listed in the configuration file, and wait for the related
8585
certificate signing requests (CSRs) to appear. When adding a new node to the cluster, two pending CSRs will
8686
be generated, and they must be manually approved by the user.
87-
Use the following command to monitor the pending certificates:
87+
88+
Use the following command or [node-joiner-monitor.sh](./node-joiner-monitor.sh) described below to monitor the pending certificates:
8889
```
8990
$ oc get csr
9091
```
91-
User the `oc` `approve` command to approve them:
92+
Use the `oc` `approve` command to approve them:
9293
```
9394
$ oc adm certificate approve <csr_name>
9495
```
@@ -100,4 +101,57 @@ extra-worker-0 Ready worker 1h v1.29.3+8628c3c
100101
master-0 Ready control-plane,master 31h v1.29.3+8628c3c
101102
master-1 Ready control-plane,master 32h v1.29.3+8628c3c
102103
master-2 Ready control-plane,master 32h v1.29.3+8628c3c
103-
```
104+
```
105+
106+
# Monitoring
107+
After a node is booted using the ISO image, progress can be monitored using the node-joiner-monitor.sh script.
108+
109+
Download the [node-joiner-monitor.sh](./node-joiner-monitor.sh) script to a local directory.
110+
111+
The script requires the IP address of the node to monitor.
112+
113+
Run [node-joiner-monitor.sh](./node-joiner-monitor.sh):
114+
```bash
115+
$ ./node-joiner-monitor.sh 192.168.111.90
116+
```
117+
118+
The script will execute a command to monitor the node using a temporary namespace with
119+
prefix `openshift-node-joiner-monitor` in the target cluster. The output of this command
120+
is printed out to stdout.
121+
122+
The script shows useful information about the node as it joins the cluster.
123+
* Pre-flight validations. In case the node does not pass one or more validations, the installation will not start. The output of the failed validations are reported to allow users to fix the problem(s) when required.
124+
* Installation progress indicating the current stage is shown. For example, writing of the image to disk, and initial reboot are reported.
125+
* CSRs requiring the user's approval are shown.
126+
127+
The script exits either after the node has joined the cluster and is in ready state or after 90 minutes have elapsed.
128+
129+
Sample monitoring output:
130+
```
131+
INFO[2024-04-29T22:45:39-04:00] Monitoring IPs: [192.168.111.90]
132+
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Assisted Service API is available
133+
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Cluster is adding hosts
134+
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Updated image information (Image type is "full-iso", SSH public key is set)
135+
INFO[2024-04-29T22:48:22-04:00] Node 192.168.111.90: Host ca241aa5-4f86-42bf-95a3-6b7ab7d4d66a: Successfully registered
136+
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host couldn't synchronize with any NTP server
137+
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host extraworker-0: updated status from discovering to insufficient (Host does not meet the minimum hardware requirements: Host couldn't synchronize with any NTP server)
138+
INFO[2024-04-29T22:49:28-04:00] Node 192.168.111.90: Host extraworker-0: updated status from known to installing (Installation is in progress)
139+
INFO[2024-04-29T22:50:28-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 5%
140+
INFO[2024-04-29T22:50:33-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 16%
141+
INFO[2024-04-29T22:50:38-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 28%
142+
INFO[2024-04-29T22:50:43-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 40%
143+
INFO[2024-04-29T22:50:48-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 51%
144+
INFO[2024-04-29T22:50:53-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 67%
145+
INFO[2024-04-29T22:50:58-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 77%
146+
INFO[2024-04-29T22:51:03-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 88%
147+
INFO[2024-04-29T22:51:08-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 93%
148+
INFO[2024-04-29T22:51:13-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Rebooting
149+
INFO[2024-04-29T22:56:35-04:00] Node 192.168.111.90: Kubelet is running
150+
INFO[2024-04-29T22:56:45-04:00] Node 192.168.111.90: First CSR Pending approval
151+
INFO[2024-04-29T22:56:45-04:00] CSR csr-257ms with signerName kubernetes.io/kube-apiserver-client-kubelet and username system:serviceaccount:openshift-machine-config-operator:node-bootstrapper is Pending and awaiting approval
152+
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Second CSR Pending approval
153+
INFO[2024-04-29T22:58:50-04:00] CSR csr-tc8xt with signerName kubernetes.io/kubelet-serving and username system:node:extraworker-0 is Pending and awaiting approval
154+
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Node joined cluster
155+
INFO[2024-04-29T23:00:00-04:00] Node 192.168.111.90: Node is Ready
156+
```
157+
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
#!/bin/bash
2+
3+
set -eu
4+
5+
if [ $# -eq 0 ]; then
6+
echo "At least one IP address must be provided"
7+
exit 1
8+
fi
9+
10+
ipAddresses=$@
11+
12+
# Setup a cleanup function to ensure to remove the temporary
13+
# file when the script will be completed.
14+
cleanup() {
15+
if [ -f "$pullSecretFile" ]; then
16+
echo "Removing temporary file $pullSecretFile"
17+
rm "$pullSecretFile"
18+
fi
19+
}
20+
trap cleanup EXIT TERM
21+
22+
# Retrieve the pullsecret and store it in a temporary file.
23+
pullSecretFile=$(mktemp -p "/tmp" -t "nodejoiner-XXXXXXXXXX")
24+
oc get secret -n openshift-config pull-secret -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d > "$pullSecretFile"
25+
26+
# Extract the baremetal-installer image pullspec from the current cluster.
27+
nodeJoinerPullspec=$(oc adm release info --image-for=baremetal-installer --registry-config="$pullSecretFile")
28+
29+
# Use the same random temp file suffix for the namespace.
30+
namespace=$(echo "openshift-node-joiner-${pullSecretFile#/tmp/nodejoiner-}" | tr '[:upper:]' '[:lower:]')
31+
32+
# Create the namespace to run the node-joiner-monitor, along with the required roles and bindings.
33+
staticResources=$(cat <<EOF
34+
apiVersion: v1
35+
kind: Namespace
36+
metadata:
37+
name: ${namespace}
38+
---
39+
apiVersion: v1
40+
kind: ServiceAccount
41+
metadata:
42+
name: node-joiner-monitor
43+
namespace: ${namespace}
44+
---
45+
apiVersion: rbac.authorization.k8s.io/v1
46+
kind: ClusterRole
47+
metadata:
48+
name: node-joiner-monitor
49+
rules:
50+
- apiGroups:
51+
- certificates.k8s.io
52+
resources:
53+
- certificatesigningrequests
54+
verbs:
55+
- get
56+
- list
57+
- apiGroups:
58+
- ""
59+
resources:
60+
- pods
61+
- nodes
62+
verbs:
63+
- get
64+
- list
65+
---
66+
apiVersion: rbac.authorization.k8s.io/v1
67+
kind: ClusterRoleBinding
68+
metadata:
69+
name: node-joiner-monitor
70+
subjects:
71+
- kind: ServiceAccount
72+
name: node-joiner-monitor
73+
namespace: ${namespace}
74+
roleRef:
75+
kind: ClusterRole
76+
name: node-joiner-monitor
77+
apiGroup: rbac.authorization.k8s.io
78+
EOF
79+
)
80+
echo "$staticResources" | oc apply -f -
81+
82+
# Run the node-joiner-monitor to monitor node joining cluster
83+
nodeJoinerPod=$(cat <<EOF
84+
apiVersion: v1
85+
kind: Pod
86+
metadata:
87+
name: node-joiner-monitor
88+
namespace: ${namespace}
89+
annotations:
90+
openshift.io/scc: anyuid
91+
labels:
92+
app: node-joiner-monitor
93+
spec:
94+
restartPolicy: Never
95+
serviceAccountName: node-joiner-monitor
96+
securityContext:
97+
seccompProfile:
98+
type: RuntimeDefault
99+
containers:
100+
- name: node-joiner-monitor
101+
imagePullPolicy: IfNotPresent
102+
image: $nodeJoinerPullspec
103+
command: ["/bin/sh", "-c", "node-joiner monitor-add-nodes $ipAddresses --log-level=info; sleep 5"]
104+
EOF
105+
)
106+
echo "$nodeJoinerPod" | oc apply -f -
107+
108+
oc project "${namespace}"
109+
110+
oc wait --for=condition=Ready=true --timeout=300s pod/node-joiner-monitor
111+
112+
oc logs -f -n "${namespace}" node-joiner-monitor
113+
114+
echo "Cleaning up"
115+
oc delete namespace "${namespace}" --grace-period=0 >/dev/null 2>&1 &

0 commit comments

Comments
 (0)