Merge pull request #70056 from jeana-redhat/OSDOCS-9244-Nutanix-failure-domain-MAPI

jeana-redhat · web-flow · commit 09f0301945c3 · 2024-02-21T09:16:14.000-05:00
OSDOCS-9244: Machine API updates for Nutanix failure domain support
diff --git a/machine_management/control_plane_machine_management/cpmso-configuration.adoc b/machine_management/control_plane_machine_management/cpmso-configuration.adoc
@@ -89,6 +89,12 @@ Some sections of the control plane machine set CR are provider-specific. The fol
 //Sample Nutanix provider specification
 include::modules/cpmso-yaml-provider-spec-nutanix.adoc[leveloffset=+2]
 
+//Failure domains for Nutanix clusters
+include::modules/mapi-failure-domain-nutanix.adoc[leveloffset=+2]
+[role="_additional-resources"]
+.Additional resources
+* xref:../../post_installation_configuration/adding-nutanix-failure-domains.adoc#adding-failure-domains-to-an-existing-nutanix-cluster[Adding failure domains to an existing Nutanix cluster]
+
 [id="cpmso-sample-yaml-vsphere_{context}"]
 == Sample YAML for configuring VMware vSphere clusters
 
diff --git a/machine_management/control_plane_machine_management/cpmso-resiliency.adoc b/machine_management/control_plane_machine_management/cpmso-resiliency.adoc
@@ -23,6 +23,8 @@ include::modules/cpmso-failure-domains-provider.adoc[leveloffset=+2]
 
 * xref:../../machine_management/control_plane_machine_management/cpmso-configuration.adoc#cpmso-yaml-failure-domain-azure_cpmso-configuration[Sample Microsoft Azure failure domain configuration]
 
+* xref:../../post_installation_configuration/adding-nutanix-failure-domains.adoc#adding-failure-domains-to-an-existing-nutanix-cluster[Adding failure domains to an existing Nutanix cluster]
+
 * xref:../../machine_management/control_plane_machine_management/cpmso-configuration.adoc#cpmso-yaml-failure-domain-openstack_cpmso-configuration[Sample {rh-openstack-first} failure domain configuration]
 
 //Balancing control plane machines
diff --git a/machine_management/creating_machinesets/creating-machineset-nutanix.adoc b/machine_management/creating_machinesets/creating-machineset-nutanix.adoc
@@ -16,3 +16,9 @@ include::modules/machineset-yaml-nutanix.adoc[leveloffset=+1]
 
 //Creating a compute machine set
 include::modules/machineset-creating.adoc[leveloffset=+1]
+
+//Failure domains for Nutanix clusters
+include::modules/mapi-failure-domain-nutanix.adoc[leveloffset=+1]
+[role="_additional-resources"]
+.Additional resources
+* xref:../../post_installation_configuration/adding-nutanix-failure-domains.adoc#adding-failure-domains-to-an-existing-nutanix-cluster[Adding failure domains to an existing Nutanix cluster]
diff --git a/modules/cpmso-failure-domains-provider.adoc b/modules/cpmso-failure-domains-provider.adoc
@@ -23,15 +23,14 @@ The control plane machine set concept of a failure domain is analogous to existi
 |X
 |link:https://cloud.google.com/compute/docs/regions-zones[zone]
 
-|Nutanix
-//link:https://portal.nutanix.com/page/documents/details?targetId=Web-Console-Guide-Prism-v6_1:arc-failure-modes-c.html[Availability domain]
-|
-|Not applicable ^[1]^
-
 |Microsoft Azure
 |X
 |link:https://learn.microsoft.com/en-us/azure/azure-web-pubsub/concept-availability-zones[Azure availability zone]
 
+|Nutanix
+|X
+|link:https://portal.nutanix.com/page/documents/solutions/details?targetId=RA-2147-Nutanix-for-Enterprise-Edge:failure-domain-considerations.html[failure domain]
+
 |VMware vSphere
 |
 |Not applicable
@@ -40,9 +39,5 @@ The control plane machine set concept of a failure domain is analogous to existi
 |X
 |link:https://docs.openstack.org/nova/2023.2/admin/availability-zones.html[OpenStack Nova availability zones] and link:https://docs.openstack.org/cinder/2023.2/admin/availability-zone-type.html[OpenStack Cinder availability zones]
 |====
-[.small]
---
-1. Nutanix has a failure domain concept, but {product-title} {product-version} does not include support for this feature.
---
 
 The failure domain configuration in the control plane machine set custom resource (CR) is platform-specific. For more information about failure domain parameters in the CR, see the sample failure domain configuration for your provider.
diff --git a/modules/machineset-modifying.adoc b/modules/machineset-modifying.adoc
@@ -19,6 +19,8 @@ By default, the {product-title} router pods are deployed on compute machines.
 Because the router is required to access some cluster resources, including the web console, do not scale the compute machine set to `0` unless you first relocate the router pods.
 ====
 
+The output examples in this procedure use the values for an AWS cluster.
+
 .Prerequisites
 
 * Your {product-title} cluster uses the Machine API.
@@ -34,7 +36,7 @@ Because the router is required to access some cluster resources, including the w
 $ oc edit machineset <machine_set_name> -n openshift-machine-api
 ----
 
-. Note the value of the `spec.replicas` field, as you need it when scaling the machine set to apply the changes.
+. Note the value of the `spec.replicas` field, because you need it when scaling the machine set to apply the changes.
 +
 [source,yaml]
 ----
@@ -58,7 +60,7 @@ spec:
 $ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
 ----
 +
-.Example output
+.Example output for an AWS cluster
 [source,text]
 ----
 NAME                        PHASE     TYPE         REGION      ZONE         AGE
@@ -75,7 +77,7 @@ $ oc annotate machine/<machine_name_original_1> \
   machine.openshift.io/delete-machine="true"
 ----
 
-. Scale the compute machine set to twice the number of replicas by running the following command:
+. To create replacement machines with the new configuration, scale the compute machine set to twice the number of replicas by running the following command:
 +
 [source,terminal]
 ----
@@ -92,7 +94,7 @@ $ oc scale --replicas=4 \// <1>
 $ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
 ----
 +
-.Example output
+.Example output for an AWS cluster
 [source,text]
 ----
 NAME                        PHASE          TYPE         REGION      ZONE         AGE
@@ -104,7 +106,7 @@ NAME                        PHASE          TYPE         REGION      ZONE
 +
 When the new machines are in the `Running` phase, you can scale the compute machine set to the original number of replicas.
 
-. Scale the compute machine set to the original number of replicas by running the following command:
+. To remove the machines that were created with the old configuration, scale the compute machine set to the original number of replicas by running the following command:
 +
 [source,terminal]
 ----
@@ -116,14 +118,21 @@ $ oc scale --replicas=2 \// <1>
 
 .Verification
 
+* To verify that a machine created by the updated machine set has the correct configuration, examine the relevant fields in the CR for one of the new machines by running the following command:
++
+[source,terminal]
+----
+$ oc describe machine <machine_name_updated_1> -n openshift-machine-api
+----
+
 * To verify that the compute machines without the updated configuration are deleted, list the machines that are managed by the updated compute machine set by running the following command:
 +
 [source,terminal]
 ----
 $ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
 ----
 +
-.Example output while deletion is in progress
+.Example output while deletion is in progress for an AWS cluster
 [source,text]
 ----
 NAME                        PHASE           TYPE         REGION      ZONE         AGE
@@ -133,17 +142,10 @@ NAME                        PHASE           TYPE         REGION      ZONE
 <machine_name_updated_2>    Running         m6i.xlarge   us-west-1   us-west-1a   5m41s
 ----
 +
-.Example output when deletion is complete
+.Example output when deletion is complete for an AWS cluster
 [source,text]
 ----
 NAME                        PHASE           TYPE         REGION      ZONE         AGE
 <machine_name_updated_1>    Running         m6i.xlarge   us-west-1   us-west-1a   6m30s
 <machine_name_updated_2>    Running         m6i.xlarge   us-west-1   us-west-1a   6m30s
-----
-
-* To verify that a machine created by the updated machine set has the correct configuration, examine the relevant fields in the CR for one of the new machines by running the following command:
-+
-[source,terminal]
-----
-$ oc describe machine <machine_name_updated_1> -n openshift-machine-api
 ----
diff --git a/modules/mapi-failure-domain-nutanix.adoc b/modules/mapi-failure-domain-nutanix.adoc
@@ -0,0 +1,19 @@
+// Module included in the following assemblies:
+//
+// * machine_management/cpmso-configuration.adoc
+// * machine_management/creating_machinesets/creating-machineset-nutanix.adoc
+
+:_mod-docs-content-type: REFERENCE
+[id="mapi-failure-domain-nutanix_{context}"]
+= Failure domains for Nutanix clusters
+
+To add or update the failure domain configuration on a Nutanix cluster, you must make coordinated changes to several resources.
+The following actions are required:
+
+. Modify the cluster infrastructure custom resource (CR).
+
+. Modify the cluster control plane machine set CR.
+
+. Modify or replace the compute machine set CRs.
+
+For more information, see "Adding failure domains to an existing Nutanix cluster" in the _Post-installation configuration_ content.
diff --git a/modules/post-installation-adding-nutanix-failure-domains-compute-machines-edit.adoc b/modules/post-installation-adding-nutanix-failure-domains-compute-machines-edit.adoc
@@ -3,15 +3,10 @@
 // * post_installation_configuration/adding-nutanix-failure-domains.adoc
 
 :_mod-docs-content-type: PROCEDURE
-[id="post-installation-adding-nutanix-failure-domains-compute-machines_{context}"]
-= Distributing compute machines across failure domains
+[id="post-installation-adding-nutanix-failure-domains-compute-machines-edit_{context}"]
+= Editing compute machine sets to implement failure domains
 
-You can distribute compute machines across Nutanix failure domains by performing either of the following tasks:
-
-* Modifying existing compute machine sets.
-* Creating new compute machine sets.
-
-The following procedure details how to distribute compute machines across failure domains by modifying existing compute machine sets. For more information on creating a compute machine set, see "Additional resources".
+To distribute compute machines across Nutanix failure domains by using an existing compute machine set, you update the compute machine set with your configuration and then use scaling to replace the existing compute machines.
 
 .Prerequisites
 
@@ -25,20 +20,32 @@ The following procedure details how to distribute compute machines across failur
 ----
 $ oc describe infrastructures.config.openshift.io cluster
 ----
+
 . For each failure domain (`platformSpec.nutanix.failureDomains`), note the cluster's UUID, name, and subnet object UUID. These values are required to add a failure domain to a compute machine set.
+
 . List the compute machine sets in your cluster by running the following command:
 +
 [source,terminal]
 ----
 $ oc get machinesets -n openshift-machine-api
 ----
++
+.Example output
+[source,terminal]
+----
+NAME                   DESIRED   CURRENT   READY   AVAILABLE   AGE
+<machine_set_name_1>   1         1         1       1           55m
+<machine_set_name_2>   1         1         1       1           55m
+----
+
 . Edit the first compute machine set by running the following command:
 +
 [source,terminal]
 ----
-$ oc edit machineset <machineset_name> -n openshift-machine-api
+$ oc edit machineset <machine_set_name_1> -n openshift-machine-api
 ----
-. Configure the compute machine set to use the first failure domain by adding the following to the `spec.template.spec.providerSpec.value` stanza:
+
+. Configure the compute machine set to use the first failure domain by updating the following to the `spec.template.spec.providerSpec.value` stanza.
 +
 [NOTE]
 ====
@@ -54,7 +61,7 @@ metadata:
   creationTimestamp: null
   labels:
     machine.openshift.io/cluster-api-cluster: <cluster_name>
-  name: <machineset_name>
+  name: <machine_set_name_1>
   namespace: openshift-machine-api
 spec:
   replicas: 2
@@ -75,14 +82,27 @@ spec:
             uuid: <prism_element_network_uuid_1>
 # ...
 ----
-. Note the value of `spec.replicas`, as you need it when scaling the machine set to apply the changes.
+
+. Note the value of `spec.replicas`, because you need it when scaling the compute machine set to apply the changes.
+
 . Save your changes.
+
 . List the machines that are managed by the updated compute machine set by running the following command:
 +
 [source,terminal]
 ----
-$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
+$ oc get -n openshift-machine-api machines \
+  -l machine.openshift.io/cluster-api-machineset=<machine_set_name_1>
+----
++
+.Example output
+[source,text]
+----
+NAME                        PHASE     TYPE   REGION    ZONE                 AGE
+<machine_name_original_1>   Running   AHV    Unnamed   Development-STS   4h
+<machine_name_original_2>   Running   AHV    Unnamed   Development-STS   4h
 ----
+
 . For each machine that is managed by the updated compute machine set, set the `delete` annotation by running the following command:
 +
 [source,terminal]
@@ -91,30 +111,34 @@ $ oc annotate machine/<machine_name_original_1> \
   -n openshift-machine-api \
   machine.openshift.io/delete-machine="true"
 ----
-. Scale the compute machine set to twice the number of replicas by running the following command:
+
+. To create replacement machines with the new configuration, scale the compute machine set to twice the number of replicas by running the following command:
 +
 [source,terminal]
 ----
 $ oc scale --replicas=<twice_the_number_of_replicas> \// <1>
-  machineset <machine_set_name> \
+  machineset <machine_set_name_1> \
   -n openshift-machine-api
 ----
 <1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `4`.
+
 . List the machines that are managed by the updated compute machine set by running the following command:
 +
 [source,terminal]
 ----
-$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
+$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name_1>
 ----
 +
 When the new machines are in the `Running` phase, you can scale the compute machine set to the original number of replicas.
-. Scale the compute machine set to the original number of replicas by running the following command:
+
+. To remove the machines that were created with the old configuration, scale the compute machine set to the original number of replicas by running the following command:
 +
 [source,terminal]
 ----
 $ oc scale --replicas=<original_number_of_replicas> \// <1>
-  machineset <machine_set_name> \
+  machineset <machine_set_name_1> \
   -n openshift-machine-api
 ----
-<1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `2`.
+<1> For example, if the original number of replicas in the compute machine set was `2`, scale the replicas to `2`.
+
 . As required, continue to modify machine sets to reference the additional failure domains that are available to the deployment.
diff --git a/modules/post-installation-adding-nutanix-failure-domains-compute-machines-replace.adoc b/modules/post-installation-adding-nutanix-failure-domains-compute-machines-replace.adoc
diff --git a/post_installation_configuration/adding-nutanix-failure-domains.adoc b/post_installation_configuration/adding-nutanix-failure-domains.adoc