TELCODOCS-643: Addition of worker nodes to SNO

amolnar-rh · amolnar-rh · commit 067c2296228c · 2022-12-13T11:51:56.000Z
diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml
@@ -2426,6 +2426,8 @@ Topics:
       File: ztp-talm-updating-managed-policies
     - Name: Updating GitOps ZTP
       File: ztp-updating-gitops
+    - Name: Adding worker nodes to single-node OpenShift cluster
+      File: ztp-sno-additional-worker-node
 ---
 Name: Specialized hardware and driver enablement
 Dir: hardware_enablement
diff --git a/modules/ztp-adding-worker-nodes.adoc b/modules/ztp-adding-worker-nodes.adoc
@@ -0,0 +1,169 @@
+// Module included in the following assemblies:
+// Epic CNF-5335 (4.11), Story TELCODOCS-643
+// scalability_and_performance/ztp-deploying-disconnected.adoc
+
+:_content-type: PROCEDURE
+[id="ztp-additional-worker-sno-proc_{context}"]
+= Adding worker nodes to {sno} clusters
+include::../_attributes/common-attributes.adoc[]
+
+You can add one or more worker nodes to existing {sno} clusters to increase CPU resources.
+
+.Prerequisites
+
+* Install and configure {rh-rhacm} 2.6 or later running on {product-title} 4.11 or later on a bare-metal cluster
+* Install {cgu-operator-full}
+* Install OpenShift GitOps Operator
+* Run {product-title} 4.12 or later in the zero touch provisioning (ZTP) container
+* Deploy an {sno} cluster through ZTP
+* Configure the Central Infrastructure Management as described in the {rh-rhacm} documentation
+* Configure the DNS serving the cluster to resolve the internal API endpoint `api-int.<cluster_name>.<base_domain>`
+
+.Procedure
+
+. If you deployed your cluster using the `example-sno.yaml` `SiteConfig` manifest, add your new worker node to the `spec.clusters['example-sno'].nodes` list:
++
+[source,yaml]
+----
+nodes:
+- hostName: "example-node2.example.com"
+  role: "worker"
+  bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1"
+  bmcCredentialsName:
+    name: "example-node2-bmh-secret"
+  bootMACAddress: "AA:BB:CC:DD:EE:11"
+  bootMode: "UEFI"
+  nodeNetwork:
+    interfaces:
+      - name: eno1
+        macAddress: "AA:BB:CC:DD:EE:11"
+    config:
+      interfaces:
+        - name: eno1
+          type: ethernet
+          state: up
+          macAddress: "AA:BB:CC:DD:EE:11"
+          ipv4:
+            enabled: false
+          ipv6:
+            enabled: true
+            address:
+            - ip: 1111:2222:3333:4444::1
+              prefix-length: 64
+      dns-resolver:
+        config:
+          search:
+          - example.com
+          server:
+          - 1111:2222:3333:4444::2
+      routes:
+        config:
+        - destination: ::/0
+          next-hop-interface: eno1
+          next-hop-address: 1111:2222:3333:4444::1
+          table-id: 254
+----
+
+. Create a BMC authentication secret for the new host, as referenced by the `bmcCredentialsName` field in the `spec.nodes` section of your `SiteConfig` file:
++
+[source,yaml]
+----
+apiVersion: v1
+data:
+  password: "password"
+  username: "username"
+kind: Secret
+metadata:
+  name: "example-node2-bmh-secret"
+  namespace: example-sno
+type: Opaque
+----
+
+. Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
+
+When the ArgoCD `cluster` application synchronizes, two new manifests appear on the hub cluster generated by the ZTP plugin:
+
+* `BareMetalHost`
+* `NMStateConfig`
+
+[IMPORTANT]
+====
+The `cpuset` field should not be configured for the worker node. Workload partitioning for worker nodes is added through management policies after the node installation is complete.
+====
+
+.Verification
+
+You can monitor the installation process in several ways.
+
+. Check if the preprovisioning images are created by running the following command:
++
+[source,terminal]
+----
+$ oc get ppimg -n example-sno
+----
++
+.Example output
++
+[source,terminal]
+----
+NAMESPACE       NAME            READY   REASON
+example-sno     example-sno     True    ImageCreated
+example-sno     example-node2   True    ImageCreated
+----
+
+. Check the state of the bare-metal hosts:
++
+[source,terminal]
+----
+$ oc get bmh -n example-sno
+----
++
+.Example output
++
+[source,terminal]
+----
+NAME            STATE          CONSUMER   ONLINE   ERROR   AGE
+example-sno     provisioned               true             69m
+example-node2   provisioning              true             4m50s <1>
+----
+<1> The `provisioning` state indicates that node booting from the installation media is in progress.
+
+. Continuously monitor the installation process:
++
+[source,terminal]
+----
+$ oc get agent -n example-sno --watch
+----
++
+.Example output
++
+[source,terminal]
+----
+NAME                                   CLUSTER   APPROVED   ROLE     STAGE
+671bc05d-5358-8940-ec12-d9ad22804faa   example-sno   true       master   Done
+[...]
+14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Starting installation
+14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Installing
+14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Writing image to disk
+[...]
+14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Waiting for control plane
+[...]
+14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Rebooting
+14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Done
+----
+
+. When the worker node installation completes, its certificates are approved automatically. At this point, the worker appears in the `ManagedClusterInfo` status:
++
+[source,terminal]
+----
+$ oc get managedclusterinfo/example-sno -n example-sno -o \
+jsonpath='{range .status.nodeList[*]}{.name}{"\t"}{.conditions}{"\t"}{.labels}{"\n"}{end}'
+----
++
+.Example output
++
+[source,terminal]
+----
+example-sno	[{"status":"True","type":"Ready"}]	{"node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":""}
+example-node2	[{"status":"True","type":"Ready"}]	{"node-role.kubernetes.io/worker":""}
+----
diff --git a/modules/ztp-additional-worker-to-sno-cluster.adoc b/modules/ztp-additional-worker-to-sno-cluster.adoc
@@ -0,0 +1,24 @@
+// Module included in the following assemblies:
+// Epic CNF-5335 (4.11), Story TELCODOCS-643
+// scalability_and_performance/ztp-deploying-disconnected.adoc
+
+:_content-type: CONCEPT
+[id="ztp-additional-worker-sno_{context}"]
+= {sno-caps} cluster expansion with worker nodes
+include::../_attributes/common-attributes.adoc[]
+
+When you add worker nodes to increase available CPU resources, the original {sno} cluster retains the control plane node role.
+
+[NOTE]
+====
+Although there is no specified limit on the number of worker nodes that you can add, you must revaluate the reserved CPU allocation on the control plane node for the additional worker nodes.
+====
+
+If workload partitioning is required on the worker node, the policies configuring the worker node must be deployed and remediated before installing the node. This way, the workload partitioning `MachineConfig` objects are rendered and associated with the `worker` `MachineConfig` pool before the `MachineConfig` ignition is downloaded by the installing worker node.
+
+The recommended procedure order is remediating policies, then installing the worker node.
+If you create the workload partitioning manifests after node installation, you must manually drain the node and delete all the pods managed by daemonsets. When the managing daemonsets create the new pods, the new pods undergo the workload partitioning process.
+
+:FeatureName: Adding worker nodes to {sno} clusters
+
+include::snippets/technology-preview.adoc[]
diff --git a/modules/ztp-worker-node-applying-du-profile.adoc b/modules/ztp-worker-node-applying-du-profile.adoc
@@ -0,0 +1,20 @@
+// Module included in the following assemblies:
+// Epic CNF-5335 (4.11), Story TELCODOCS-643
+// scalability_and_performance/ztp-deploying-disconnected.adoc
+
+:_content-type: CONCEPT
+[id="ztp-additional-worker-apply-du-profile_{context}"]
+= Applying profiles to the worker node
+include::../_attributes/common-attributes.adoc[]
+
+You can configure the additional worker node with a DU profile.
+
+You can apply a RAN distributed unit (DU) profile to the worker node cluster using the ZTP GitOps common, group, and site-specific `PolicyGenTemplate` resources. The GitOps ZTP pipeline that is linked to the ArgoCD `policies` application includes the following CRs that you can find in the `out/argocd/example/policygentemplates` folder when you extract the `ztp-site-generate` container:
+
+* `common-ranGen.yaml`
+* `group-du-sno-ranGen.yaml`
+* `example-sno-site.yaml`
+* `ns.yaml`
+* `kustomization.yaml`
+
+Configuring the DU profile on the worker node is considered an upgrade. To initiate the upgrade flow, you must update the existing policies or create additional ones. Then, you must create a `ClusterGroupUpgrade` CR to reconcile the policies in the group of clusters.
diff --git a/modules/ztp-worker-node-daemon-selector-compatibility.adoc b/modules/ztp-worker-node-daemon-selector-compatibility.adoc
@@ -0,0 +1,68 @@
+// Module included in the following assemblies:
+// Epic CNF-5335 (4.11), Story TELCODOCS-643
+// scalability_and_performance/ztp-deploying-disconnected.adoc
+
+:_content-type: PROCEDURE
+[id="ztp-additional-worker-daemon-selector-comp_{context}"]
+= (Optional) Ensuring PTP and SR-IOV daemon selector compatibility
+
+If the DU profile was deployed using the GitOps ZTP plugin version 4.11 or earlier, the PTP and SR-IOV Operators might be configured to place the daemons only on nodes labelled as `master`. This configuration prevents the PTP and SR-IOV daemons from operating on the worker node. If the PTP and SR-IOV daemon node selectors are incorrectly configured on your system, you must change the daemons before proceeding with the worker DU profile configuration.
+
+.Procedure
+
+. Check the daemon node selector settings of the PTP Operator on one of the spoke clusters:
++
+[source,terminal]
+----
+$ oc get ptpoperatorconfig/default -n openshift-ptp -ojsonpath='{.spec}' | jq
+----
++
+.Example output for PTP Operator
++
+[source,json]
+----
+{"daemonNodeSelector":{"node-role.kubernetes.io/master":""}} <1>
+----
+<1> If the node selector is set to `master`, the spoke was deployed with the version of the ZTP plugin that requires changes.
+
+. Check the daemon node selector settings of the SR-IOV Operator on one of the spoke clusters:
++
+[source,terminal]
+----
+$  oc get sriovoperatorconfig/default -n \
+openshift-sriov-network-operator -ojsonpath='{.spec}' | jq
+----
++
+.Example output for SR-IOV Operator
++
+[source,json]
+----
+{"configDaemonNodeSelector":{"node-role.kubernetes.io/worker":""},"disableDrain":false,"enableInjector":true,"enableOperatorWebhook":true} <1>
+----
+<1> If the node selector is set to `master`, the spoke was deployed with the version of the ZTP plugin that requires changes.
+
+. In the group policy, add the following `complianceType` and `spec` entries:
++
+[source,yaml]
+----
+spec:
+    - fileName: PtpOperatorConfig.yaml
+      policyName: "config-policy"
+      complianceType: mustonlyhave
+      spec:
+        daemonNodeSelector:
+          node-role.kubernetes.io/worker: ""
+    - fileName: SriovOperatorConfig.yaml
+      policyName: "config-policy"
+      complianceType: mustonlyhave
+      spec:
+        configDaemonNodeSelector:
+          node-role.kubernetes.io/worker: ""
+----
++
+[IMPORTANT]
+====
+Changing the `daemonNodeSelector` field causes temporary PTP synchronization loss and SR-IOV connectivity loss.
+====
+
+. Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
diff --git a/modules/ztp-worker-node-node-selector-compatibility.adoc b/modules/ztp-worker-node-node-selector-compatibility.adoc
@@ -0,0 +1,9 @@
+// Module included in the following assemblies:
+// Epic CNF-5335 (4.11), Story TELCODOCS-643
+// scalability_and_performance/ztp-deploying-disconnected.adoc
+
+:_content-type: CONCEPT
+[id="ztp-additional-worker-node-selector-comp_{context}"]
+= PTP and SR-IOV node selector compatibility
+
+The PTP configuration resources and SR-IOV network node policies use `node-role.kubernetes.io/master: ""` as the node selector. If the additional worker nodes have the same NIC configuration as the control plane node, the policies used to configure the control plane node can be reused for the worker nodes. However, the node selector must be changed to select both node types, for example with the `"node-role.kubernetes.io/worker"` label.
diff --git a/modules/ztp-worker-node-preparing-policies.adoc b/modules/ztp-worker-node-preparing-policies.adoc
diff --git a/scalability_and_performance/ztp_far_edge/ztp-sno-additional-worker-node.adoc b/scalability_and_performance/ztp_far_edge/ztp-sno-additional-worker-node.adoc