Skip to content

Commit 067c229

Browse files
committed
TELCODOCS-643: Addition of worker nodes to SNO
1 parent 8a85dd3 commit 067c229

8 files changed

+448
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2426,6 +2426,8 @@ Topics:
24262426
File: ztp-talm-updating-managed-policies
24272427
- Name: Updating GitOps ZTP
24282428
File: ztp-updating-gitops
2429+
- Name: Adding worker nodes to single-node OpenShift cluster
2430+
File: ztp-sno-additional-worker-node
24292431
---
24302432
Name: Specialized hardware and driver enablement
24312433
Dir: hardware_enablement

modules/ztp-adding-worker-nodes.adoc

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
// Module included in the following assemblies:
2+
// Epic CNF-5335 (4.11), Story TELCODOCS-643
3+
// scalability_and_performance/ztp-deploying-disconnected.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="ztp-additional-worker-sno-proc_{context}"]
7+
= Adding worker nodes to {sno} clusters
8+
include::../_attributes/common-attributes.adoc[]
9+
10+
You can add one or more worker nodes to existing {sno} clusters to increase CPU resources.
11+
12+
.Prerequisites
13+
14+
* Install and configure {rh-rhacm} 2.6 or later running on {product-title} 4.11 or later on a bare-metal cluster
15+
* Install {cgu-operator-full}
16+
* Install OpenShift GitOps Operator
17+
* Run {product-title} 4.12 or later in the zero touch provisioning (ZTP) container
18+
* Deploy an {sno} cluster through ZTP
19+
* Configure the Central Infrastructure Management as described in the {rh-rhacm} documentation
20+
* Configure the DNS serving the cluster to resolve the internal API endpoint `api-int.<cluster_name>.<base_domain>`
21+
22+
.Procedure
23+
24+
. If you deployed your cluster using the `example-sno.yaml` `SiteConfig` manifest, add your new worker node to the `spec.clusters['example-sno'].nodes` list:
25+
+
26+
[source,yaml]
27+
----
28+
nodes:
29+
- hostName: "example-node2.example.com"
30+
role: "worker"
31+
bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1"
32+
bmcCredentialsName:
33+
name: "example-node2-bmh-secret"
34+
bootMACAddress: "AA:BB:CC:DD:EE:11"
35+
bootMode: "UEFI"
36+
nodeNetwork:
37+
interfaces:
38+
- name: eno1
39+
macAddress: "AA:BB:CC:DD:EE:11"
40+
config:
41+
interfaces:
42+
- name: eno1
43+
type: ethernet
44+
state: up
45+
macAddress: "AA:BB:CC:DD:EE:11"
46+
ipv4:
47+
enabled: false
48+
ipv6:
49+
enabled: true
50+
address:
51+
- ip: 1111:2222:3333:4444::1
52+
prefix-length: 64
53+
dns-resolver:
54+
config:
55+
search:
56+
- example.com
57+
server:
58+
- 1111:2222:3333:4444::2
59+
routes:
60+
config:
61+
- destination: ::/0
62+
next-hop-interface: eno1
63+
next-hop-address: 1111:2222:3333:4444::1
64+
table-id: 254
65+
----
66+
67+
. Create a BMC authentication secret for the new host, as referenced by the `bmcCredentialsName` field in the `spec.nodes` section of your `SiteConfig` file:
68+
+
69+
[source,yaml]
70+
----
71+
apiVersion: v1
72+
data:
73+
password: "password"
74+
username: "username"
75+
kind: Secret
76+
metadata:
77+
name: "example-node2-bmh-secret"
78+
namespace: example-sno
79+
type: Opaque
80+
----
81+
82+
. Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
83+
84+
When the ArgoCD `cluster` application synchronizes, two new manifests appear on the hub cluster generated by the ZTP plugin:
85+
86+
* `BareMetalHost`
87+
* `NMStateConfig`
88+
89+
[IMPORTANT]
90+
====
91+
The `cpuset` field should not be configured for the worker node. Workload partitioning for worker nodes is added through management policies after the node installation is complete.
92+
====
93+
94+
.Verification
95+
96+
You can monitor the installation process in several ways.
97+
98+
. Check if the preprovisioning images are created by running the following command:
99+
+
100+
[source,terminal]
101+
----
102+
$ oc get ppimg -n example-sno
103+
----
104+
+
105+
.Example output
106+
+
107+
[source,terminal]
108+
----
109+
NAMESPACE NAME READY REASON
110+
example-sno example-sno True ImageCreated
111+
example-sno example-node2 True ImageCreated
112+
----
113+
114+
. Check the state of the bare-metal hosts:
115+
+
116+
[source,terminal]
117+
----
118+
$ oc get bmh -n example-sno
119+
----
120+
+
121+
.Example output
122+
+
123+
[source,terminal]
124+
----
125+
NAME STATE CONSUMER ONLINE ERROR AGE
126+
example-sno provisioned true 69m
127+
example-node2 provisioning true 4m50s <1>
128+
----
129+
<1> The `provisioning` state indicates that node booting from the installation media is in progress.
130+
131+
. Continuously monitor the installation process:
132+
+
133+
[source,terminal]
134+
----
135+
$ oc get agent -n example-sno --watch
136+
----
137+
+
138+
.Example output
139+
+
140+
[source,terminal]
141+
----
142+
NAME CLUSTER APPROVED ROLE STAGE
143+
671bc05d-5358-8940-ec12-d9ad22804faa example-sno true master Done
144+
[...]
145+
14fd821b-a35d-9cba-7978-00ddf535ff37 example-sno true worker Starting installation
146+
14fd821b-a35d-9cba-7978-00ddf535ff37 example-sno true worker Installing
147+
14fd821b-a35d-9cba-7978-00ddf535ff37 example-sno true worker Writing image to disk
148+
[...]
149+
14fd821b-a35d-9cba-7978-00ddf535ff37 example-sno true worker Waiting for control plane
150+
[...]
151+
14fd821b-a35d-9cba-7978-00ddf535ff37 example-sno true worker Rebooting
152+
14fd821b-a35d-9cba-7978-00ddf535ff37 example-sno true worker Done
153+
----
154+
155+
. When the worker node installation completes, its certificates are approved automatically. At this point, the worker appears in the `ManagedClusterInfo` status:
156+
+
157+
[source,terminal]
158+
----
159+
$ oc get managedclusterinfo/example-sno -n example-sno -o \
160+
jsonpath='{range .status.nodeList[*]}{.name}{"\t"}{.conditions}{"\t"}{.labels}{"\n"}{end}'
161+
----
162+
+
163+
.Example output
164+
+
165+
[source,terminal]
166+
----
167+
example-sno [{"status":"True","type":"Ready"}] {"node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":""}
168+
example-node2 [{"status":"True","type":"Ready"}] {"node-role.kubernetes.io/worker":""}
169+
----
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
// Module included in the following assemblies:
2+
// Epic CNF-5335 (4.11), Story TELCODOCS-643
3+
// scalability_and_performance/ztp-deploying-disconnected.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="ztp-additional-worker-sno_{context}"]
7+
= {sno-caps} cluster expansion with worker nodes
8+
include::../_attributes/common-attributes.adoc[]
9+
10+
When you add worker nodes to increase available CPU resources, the original {sno} cluster retains the control plane node role.
11+
12+
[NOTE]
13+
====
14+
Although there is no specified limit on the number of worker nodes that you can add, you must revaluate the reserved CPU allocation on the control plane node for the additional worker nodes.
15+
====
16+
17+
If workload partitioning is required on the worker node, the policies configuring the worker node must be deployed and remediated before installing the node. This way, the workload partitioning `MachineConfig` objects are rendered and associated with the `worker` `MachineConfig` pool before the `MachineConfig` ignition is downloaded by the installing worker node.
18+
19+
The recommended procedure order is remediating policies, then installing the worker node.
20+
If you create the workload partitioning manifests after node installation, you must manually drain the node and delete all the pods managed by daemonsets. When the managing daemonsets create the new pods, the new pods undergo the workload partitioning process.
21+
22+
:FeatureName: Adding worker nodes to {sno} clusters
23+
24+
include::snippets/technology-preview.adoc[]
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
// Module included in the following assemblies:
2+
// Epic CNF-5335 (4.11), Story TELCODOCS-643
3+
// scalability_and_performance/ztp-deploying-disconnected.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="ztp-additional-worker-apply-du-profile_{context}"]
7+
= Applying profiles to the worker node
8+
include::../_attributes/common-attributes.adoc[]
9+
10+
You can configure the additional worker node with a DU profile.
11+
12+
You can apply a RAN distributed unit (DU) profile to the worker node cluster using the ZTP GitOps common, group, and site-specific `PolicyGenTemplate` resources. The GitOps ZTP pipeline that is linked to the ArgoCD `policies` application includes the following CRs that you can find in the `out/argocd/example/policygentemplates` folder when you extract the `ztp-site-generate` container:
13+
14+
* `common-ranGen.yaml`
15+
* `group-du-sno-ranGen.yaml`
16+
* `example-sno-site.yaml`
17+
* `ns.yaml`
18+
* `kustomization.yaml`
19+
20+
Configuring the DU profile on the worker node is considered an upgrade. To initiate the upgrade flow, you must update the existing policies or create additional ones. Then, you must create a `ClusterGroupUpgrade` CR to reconcile the policies in the group of clusters.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
// Module included in the following assemblies:
2+
// Epic CNF-5335 (4.11), Story TELCODOCS-643
3+
// scalability_and_performance/ztp-deploying-disconnected.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="ztp-additional-worker-daemon-selector-comp_{context}"]
7+
= (Optional) Ensuring PTP and SR-IOV daemon selector compatibility
8+
9+
If the DU profile was deployed using the GitOps ZTP plugin version 4.11 or earlier, the PTP and SR-IOV Operators might be configured to place the daemons only on nodes labelled as `master`. This configuration prevents the PTP and SR-IOV daemons from operating on the worker node. If the PTP and SR-IOV daemon node selectors are incorrectly configured on your system, you must change the daemons before proceeding with the worker DU profile configuration.
10+
11+
.Procedure
12+
13+
. Check the daemon node selector settings of the PTP Operator on one of the spoke clusters:
14+
+
15+
[source,terminal]
16+
----
17+
$ oc get ptpoperatorconfig/default -n openshift-ptp -ojsonpath='{.spec}' | jq
18+
----
19+
+
20+
.Example output for PTP Operator
21+
+
22+
[source,json]
23+
----
24+
{"daemonNodeSelector":{"node-role.kubernetes.io/master":""}} <1>
25+
----
26+
<1> If the node selector is set to `master`, the spoke was deployed with the version of the ZTP plugin that requires changes.
27+
28+
. Check the daemon node selector settings of the SR-IOV Operator on one of the spoke clusters:
29+
+
30+
[source,terminal]
31+
----
32+
$ oc get sriovoperatorconfig/default -n \
33+
openshift-sriov-network-operator -ojsonpath='{.spec}' | jq
34+
----
35+
+
36+
.Example output for SR-IOV Operator
37+
+
38+
[source,json]
39+
----
40+
{"configDaemonNodeSelector":{"node-role.kubernetes.io/worker":""},"disableDrain":false,"enableInjector":true,"enableOperatorWebhook":true} <1>
41+
----
42+
<1> If the node selector is set to `master`, the spoke was deployed with the version of the ZTP plugin that requires changes.
43+
44+
. In the group policy, add the following `complianceType` and `spec` entries:
45+
+
46+
[source,yaml]
47+
----
48+
spec:
49+
- fileName: PtpOperatorConfig.yaml
50+
policyName: "config-policy"
51+
complianceType: mustonlyhave
52+
spec:
53+
daemonNodeSelector:
54+
node-role.kubernetes.io/worker: ""
55+
- fileName: SriovOperatorConfig.yaml
56+
policyName: "config-policy"
57+
complianceType: mustonlyhave
58+
spec:
59+
configDaemonNodeSelector:
60+
node-role.kubernetes.io/worker: ""
61+
----
62+
+
63+
[IMPORTANT]
64+
====
65+
Changing the `daemonNodeSelector` field causes temporary PTP synchronization loss and SR-IOV connectivity loss.
66+
====
67+
68+
. Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
// Module included in the following assemblies:
2+
// Epic CNF-5335 (4.11), Story TELCODOCS-643
3+
// scalability_and_performance/ztp-deploying-disconnected.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="ztp-additional-worker-node-selector-comp_{context}"]
7+
= PTP and SR-IOV node selector compatibility
8+
9+
The PTP configuration resources and SR-IOV network node policies use `node-role.kubernetes.io/master: ""` as the node selector. If the additional worker nodes have the same NIC configuration as the control plane node, the policies used to configure the control plane node can be reused for the worker nodes. However, the node selector must be changed to select both node types, for example with the `"node-role.kubernetes.io/worker"` label.

0 commit comments

Comments
 (0)