Skip to content

Commit bbca50f

Browse files
Merge pull request #8294 from rwsu/AGENT-863
AGENT-906: Script to run monitor-add-nodes in cluster
2 parents d9a10f0 + cc833e9 commit bbca50f

File tree

2 files changed

+173
-4
lines changed

2 files changed

+173
-4
lines changed

docs/user/agent/add-node/add-nodes.md

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ hosts:
6767
```
6868

6969
## ISO generation
70-
Run the [node-joiner.sh](./node-joiner.sh):
70+
Run [node-joiner.sh](./node-joiner.sh):
7171
```bash
7272
$ ./node-joiner.sh
7373
```
@@ -87,11 +87,12 @@ $ ./node-joiner.sh config.yaml
8787
Use the iso image to boot all the nodes listed in the configuration file, and wait for the related
8888
certificate signing requests (CSRs) to appear. When adding a new node to the cluster, two pending CSRs will
8989
be generated, and they must be manually approved by the user.
90-
Use the following command to monitor the pending certificates:
90+
91+
Use the following command or [node-joiner-monitor.sh](./node-joiner-monitor.sh) described below to monitor the pending certificates:
9192
```
9293
$ oc get csr
9394
```
94-
User the `oc` `approve` command to approve them:
95+
Use the `oc` `approve` command to approve them:
9596
```
9697
$ oc adm certificate approve <csr_name>
9798
```
@@ -103,4 +104,57 @@ extra-worker-0 Ready worker 1h v1.29.3+8628c3c
103104
master-0 Ready control-plane,master 31h v1.29.3+8628c3c
104105
master-1 Ready control-plane,master 32h v1.29.3+8628c3c
105106
master-2 Ready control-plane,master 32h v1.29.3+8628c3c
106-
```
107+
```
108+
109+
# Monitoring
110+
After a node is booted using the ISO image, progress can be monitored using the node-joiner-monitor.sh script.
111+
112+
Download the [node-joiner-monitor.sh](./node-joiner-monitor.sh) script to a local directory.
113+
114+
The script requires the IP address of the node to monitor.
115+
116+
Run [node-joiner-monitor.sh](./node-joiner-monitor.sh):
117+
```bash
118+
$ ./node-joiner-monitor.sh 192.168.111.90
119+
```
120+
121+
The script will execute a command to monitor the node using a temporary namespace with
122+
prefix `openshift-node-joiner-monitor` in the target cluster. The output of this command
123+
is printed out to stdout.
124+
125+
The script shows useful information about the node as it joins the cluster.
126+
* Pre-flight validations. In case the node does not pass one or more validations, the installation will not start. The output of the failed validations are reported to allow users to fix the problem(s) when required.
127+
* Installation progress indicating the current stage is shown. For example, writing of the image to disk, and initial reboot are reported.
128+
* CSRs requiring the user's approval are shown.
129+
130+
The script exits either after the node has joined the cluster and is in ready state or after 90 minutes have elapsed.
131+
132+
Sample monitoring output:
133+
```
134+
INFO[2024-04-29T22:45:39-04:00] Monitoring IPs: [192.168.111.90]
135+
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Assisted Service API is available
136+
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Cluster is adding hosts
137+
INFO[2024-04-29T22:48:17-04:00] Node 192.168.111.90: Updated image information (Image type is "full-iso", SSH public key is set)
138+
INFO[2024-04-29T22:48:22-04:00] Node 192.168.111.90: Host ca241aa5-4f86-42bf-95a3-6b7ab7d4d66a: Successfully registered
139+
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host couldn't synchronize with any NTP server
140+
WARNING[2024-04-29T22:48:32-04:00] Node 192.168.111.90: Host extraworker-0: updated status from discovering to insufficient (Host does not meet the minimum hardware requirements: Host couldn't synchronize with any NTP server)
141+
INFO[2024-04-29T22:49:28-04:00] Node 192.168.111.90: Host extraworker-0: updated status from known to installing (Installation is in progress)
142+
INFO[2024-04-29T22:50:28-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 5%
143+
INFO[2024-04-29T22:50:33-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 16%
144+
INFO[2024-04-29T22:50:38-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 28%
145+
INFO[2024-04-29T22:50:43-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 40%
146+
INFO[2024-04-29T22:50:48-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 51%
147+
INFO[2024-04-29T22:50:53-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 67%
148+
INFO[2024-04-29T22:50:58-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 77%
149+
INFO[2024-04-29T22:51:03-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 88%
150+
INFO[2024-04-29T22:51:08-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Writing image to disk: 93%
151+
INFO[2024-04-29T22:51:13-04:00] Node 192.168.111.90: Host: extraworker-0, reached installation stage Rebooting
152+
INFO[2024-04-29T22:56:35-04:00] Node 192.168.111.90: Kubelet is running
153+
INFO[2024-04-29T22:56:45-04:00] Node 192.168.111.90: First CSR Pending approval
154+
INFO[2024-04-29T22:56:45-04:00] CSR csr-257ms with signerName kubernetes.io/kube-apiserver-client-kubelet and username system:serviceaccount:openshift-machine-config-operator:node-bootstrapper is Pending and awaiting approval
155+
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Second CSR Pending approval
156+
INFO[2024-04-29T22:58:50-04:00] CSR csr-tc8xt with signerName kubernetes.io/kubelet-serving and username system:node:extraworker-0 is Pending and awaiting approval
157+
INFO[2024-04-29T22:58:50-04:00] Node 192.168.111.90: Node joined cluster
158+
INFO[2024-04-29T23:00:00-04:00] Node 192.168.111.90: Node is Ready
159+
```
160+
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
#!/bin/bash
2+
3+
set -eu
4+
5+
if [ $# -eq 0 ]; then
6+
echo "At least one IP address must be provided"
7+
exit 1
8+
fi
9+
10+
ipAddresses=$@
11+
12+
# Setup a cleanup function to ensure to remove the temporary
13+
# file when the script will be completed.
14+
cleanup() {
15+
if [ -f "$pullSecretFile" ]; then
16+
echo "Removing temporary file $pullSecretFile"
17+
rm "$pullSecretFile"
18+
fi
19+
}
20+
trap cleanup EXIT TERM
21+
22+
# Retrieve the pullsecret and store it in a temporary file.
23+
pullSecretFile=$(mktemp -p "/tmp" -t "nodejoiner-XXXXXXXXXX")
24+
oc get secret -n openshift-config pull-secret -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d > "$pullSecretFile"
25+
26+
# Extract the baremetal-installer image pullspec from the current cluster.
27+
nodeJoinerPullspec=$(oc adm release info --image-for=baremetal-installer --registry-config="$pullSecretFile")
28+
29+
# Use the same random temp file suffix for the namespace.
30+
namespace=$(echo "openshift-node-joiner-${pullSecretFile#/tmp/nodejoiner-}" | tr '[:upper:]' '[:lower:]')
31+
32+
# Create the namespace to run the node-joiner-monitor, along with the required roles and bindings.
33+
staticResources=$(cat <<EOF
34+
apiVersion: v1
35+
kind: Namespace
36+
metadata:
37+
name: ${namespace}
38+
---
39+
apiVersion: v1
40+
kind: ServiceAccount
41+
metadata:
42+
name: node-joiner-monitor
43+
namespace: ${namespace}
44+
---
45+
apiVersion: rbac.authorization.k8s.io/v1
46+
kind: ClusterRole
47+
metadata:
48+
name: node-joiner-monitor
49+
rules:
50+
- apiGroups:
51+
- certificates.k8s.io
52+
resources:
53+
- certificatesigningrequests
54+
verbs:
55+
- get
56+
- list
57+
- apiGroups:
58+
- ""
59+
resources:
60+
- pods
61+
- nodes
62+
verbs:
63+
- get
64+
- list
65+
---
66+
apiVersion: rbac.authorization.k8s.io/v1
67+
kind: ClusterRoleBinding
68+
metadata:
69+
name: node-joiner-monitor
70+
subjects:
71+
- kind: ServiceAccount
72+
name: node-joiner-monitor
73+
namespace: ${namespace}
74+
roleRef:
75+
kind: ClusterRole
76+
name: node-joiner-monitor
77+
apiGroup: rbac.authorization.k8s.io
78+
EOF
79+
)
80+
echo "$staticResources" | oc apply -f -
81+
82+
# Run the node-joiner-monitor to monitor node joining cluster
83+
nodeJoinerPod=$(cat <<EOF
84+
apiVersion: v1
85+
kind: Pod
86+
metadata:
87+
name: node-joiner-monitor
88+
namespace: ${namespace}
89+
annotations:
90+
openshift.io/scc: anyuid
91+
labels:
92+
app: node-joiner-monitor
93+
spec:
94+
restartPolicy: Never
95+
serviceAccountName: node-joiner-monitor
96+
securityContext:
97+
seccompProfile:
98+
type: RuntimeDefault
99+
containers:
100+
- name: node-joiner-monitor
101+
imagePullPolicy: IfNotPresent
102+
image: $nodeJoinerPullspec
103+
command: ["/bin/sh", "-c", "node-joiner monitor-add-nodes $ipAddresses --log-level=info; sleep 5"]
104+
EOF
105+
)
106+
echo "$nodeJoinerPod" | oc apply -f -
107+
108+
oc project "${namespace}"
109+
110+
oc wait --for=condition=Ready=true --timeout=300s pod/node-joiner-monitor
111+
112+
oc logs -f -n "${namespace}" node-joiner-monitor
113+
114+
echo "Cleaning up"
115+
oc delete namespace "${namespace}" --grace-period=0 >/dev/null 2>&1 &

0 commit comments

Comments
 (0)