Skip to content

Commit e29f386

Browse files
committed
OSDOCS#8867: Adding failure domains to Nutanix docs
1 parent 96ac9a4 commit e29f386

11 files changed

+473
-2
lines changed

_topic_maps/_topic_map.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,8 @@ Topics:
307307
Topics:
308308
- Name: Preparing to install on Nutanix
309309
File: preparing-to-install-on-nutanix
310+
- Name: Fault tolerant deployments
311+
File: nutanix-failure-domains
310312
- Name: Installing a cluster on Nutanix
311313
File: installing-nutanix-installer-provisioned
312314
- Name: Installing a cluster on Nutanix in a restricted network
@@ -606,6 +608,9 @@ Topics:
606608
- Name: AWS Local Zone tasks
607609
File: aws-compute-edge-tasks
608610
Distros: openshift-enterprise
611+
- Name: Adding failure domains to an existing Nutanix cluster
612+
File: adding-nutanix-failure-domains
613+
Distros: openshift-origin,openshift-enterprise
609614
---
610615
Name: Updating clusters
611616
Dir: updating

installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ include::modules/installation-initializing.adoc[leveloffset=+1]
4747
* xref:../../installing/installing_nutanix/installation-config-parameters-nutanix.adoc#installation-config-parameters-nutanix[Installation configuration parameters for Nutanix]
4848

4949
include::modules/installation-nutanix-config-yaml.adoc[leveloffset=+2]
50+
include::modules/installation-configuring-nutanix-failure-domains.adoc[leveloffset=+2]
5051
include::modules/installation-configure-proxy.adoc[leveloffset=+2]
5152

5253
include::modules/cli-installing-cli.adoc[leveloffset=+1]

installing/installing_nutanix/installing-restricted-networks-nutanix-installer-provisioned.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ include::modules/installation-initializing.adoc[leveloffset=+1]
4646
* xref:../../installing/installing_nutanix/installation-config-parameters-nutanix.adoc#installation-config-parameters-nutanix[Installation configuration parameters for Nutanix]
4747

4848
include::modules/installation-nutanix-config-yaml.adoc[leveloffset=+2]
49+
include::modules/installation-configuring-nutanix-failure-domains.adoc[leveloffset=+2]
4950
include::modules/installation-configure-proxy.adoc[leveloffset=+2]
5051

5152
include::modules/cli-installing-cli.adoc[leveloffset=+1]
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="nutanix-failure-domains"]
3+
= Fault tolerant deployments using multiple Prism Elements
4+
include::_attributes/common-attributes.adoc[]
5+
:context: nutanix-failure-domains
6+
7+
toc::[]
8+
9+
By default, the installation program installs control plane and compute machines into a single Nutanix Prism Element (cluster). To improve the fault tolerance of your {product-title} cluster, you can specify that these machines be distributed across multiple Nutanix clusters by configuring failure domains.
10+
11+
A failure domain represents an additional Prism Element instance that is available to {product-title} machine pools during and after installation.
12+
13+
include::modules/installation-nutanix-failure-domains-req.adoc[leveloffset=+1]
14+
15+
== Installation method and failure domain configuration
16+
17+
The {product-title} installation method determines how and when you configure failure domains:
18+
19+
* If you deploy using installer-provisioned infrastructure, you can configure failure domains in the installation configuration file before deploying the cluster. For more information, see xref:../../installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc#installation-configuring-nutanix-failure-domains_installing-nutanix-installer-provisioned[Configuring failure domains].
20+
+
21+
You can also configure failure domains after the cluster is deployed.
22+
* If you deploy using the {ai-full}, you configure failure domains after the cluster is deployed.
23+
+
24+
For more information about configuring failure domains post-installation, see xref:../../post_installation_configuration/adding-nutanix-failure-domains.adoc#adding-failure-domains-to-an-existing-nutanix-cluster[Adding failure domains to an existing Nutanix cluster].
25+
26+
* If you deploy using infrastructure that you manage (user-provisioned infrastructure) no additional configuration is required. After the cluster is deployed, you can manually distribute control plane and compute machines across failure domains.

modules/installation-configuration-parameters.adoc

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2956,6 +2956,17 @@ Additional Nutanix configuration parameters are described in the following table
29562956
|The value of a prism category key-value pair to apply to compute VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
29572957
|String
29582958

2959+
|compute:
2960+
platform:
2961+
nutanix:
2962+
failureDomains:
2963+
d|The failure domains that apply to only compute machines.
2964+
2965+
Failure domains are specified in `platform.nutanix.failureDomains`.
2966+
d|List.
2967+
2968+
The name of one or more failures domains.
2969+
29592970
|compute:
29602971
platform:
29612972
nutanix:
@@ -2995,6 +3006,17 @@ Additional Nutanix configuration parameters are described in the following table
29953006
|The value of a prism category key-value pair to apply to control plane VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
29963007
|String
29973008

3009+
|controlPlane:
3010+
platform:
3011+
nutanix:
3012+
failureDomains:
3013+
d|The failure domains that apply to only control plane machines.
3014+
3015+
Failure domains are specified in `platform.nutanix.failureDomains`.
3016+
d|List.
3017+
3018+
The name of one or more failures domains.
3019+
29983020
|controlPlane:
29993021
platform:
30003022
nutanix:
@@ -3027,6 +3049,17 @@ Additional Nutanix configuration parameters are described in the following table
30273049
|The value of a prism category key-value pair to apply to all VMs. This parameter must be accompanied by the `key` parameter, and both `key` and `value` parameters must exist in Prism Central.
30283050
|String
30293051

3052+
|platform:
3053+
nutanix:
3054+
defaulatMachinePlatform:
3055+
failureDomains:
3056+
d|The failure domains that apply to both control plane and compute machines.
3057+
3058+
Failure domains are specified in `platform.nutanix.failureDomains`.
3059+
d|List.
3060+
3061+
The name of one or more failures domains.
3062+
30303063
|platform:
30313064
nutanix:
30323065
defaultMachinePlatform:
@@ -3056,6 +3089,23 @@ Additional Nutanix configuration parameters are described in the following table
30563089
|The virtual IP (VIP) address that you configured for control plane API access.
30573090
|IP address
30583091

3092+
|platform:
3093+
nutanix:
3094+
failureDomains:
3095+
- name:
3096+
prismElement:
3097+
name:
3098+
uuid:
3099+
subnetUUIDs:
3100+
-
3101+
a|By default, the installation program installs cluster machines to a single Prism Element instance. You can specify additional Prism Element instances for fault tolerance, and then apply them to:
3102+
3103+
* The cluster's default machine configuration
3104+
* Only control plane or compute machine pools
3105+
d|A list of configured failure domains.
3106+
3107+
For more information on usage, see "Configuring a failure domain" in "Installing a cluster on Nutanix".
3108+
30593109
|platform:
30603110
nutanix:
30613111
ingressVIP:
@@ -3129,8 +3179,8 @@ Additional Nutanix configuration parameters are described in the following table
31293179
|====
31303180
[.small]
31313181
--
3132-
1. The `prismElements` section holds a list of Prism Elements (clusters). A Prism Element encompasses all of the Nutanix resources, for example virtual machines and subnets, that are used to host the {product-title} cluster. Only a single Prism Element is supported.
3133-
2. Only one subnet per {product-title} cluster is supported.
3182+
1. The `prismElements` section holds a list of Prism Elements (clusters). A Prism Element encompasses all of the Nutanix resources, for example virtual machines and subnets, that are used to host the {product-title} cluster.
3183+
2. Only one subnet per Prism Element in an {product-title} cluster is supported.
31343184
--
31353185
endif::nutanix[]
31363186

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * installing/installing_nutanix/installing-nutanix-installer-provisioned.adoc
4+
// * installing/installing_nutanix/installing-restricted-networks-nutanix-installer-provisioned.adoc
5+
6+
:_mod-docs-content-type: PROCEDURE
7+
[id="installation-configuring-nutanix-failure-domains_{context}"]
8+
= Configuring failure domains
9+
10+
Failure domains improve the fault tolerance of an {product-title} cluster by distributing control plane and compute machines across multiple Nutanix Prism Elements (clusters).
11+
12+
[TIP]
13+
====
14+
It is recommended that you configure three failure domains to ensure high-availability.
15+
====
16+
17+
.Prerequisites
18+
19+
* You have an installation configuration file (`install-config.yaml`).
20+
21+
.Procedure
22+
23+
. Edit the `install-config.yaml` file and add the following stanza to configure the first failure domain:
24+
+
25+
[source,yaml]
26+
----
27+
apiVersion: v1
28+
baseDomain: example.com
29+
compute:
30+
# ...
31+
platform:
32+
nutanix:
33+
failureDomains:
34+
- name: <failure_domain_name>
35+
prismElement:
36+
name: <prism_element_name>
37+
uuid: <prism_element_uuid>
38+
subnetUUIDs:
39+
- <network_uuid>
40+
# ...
41+
----
42+
+
43+
where:
44+
45+
`<failure_domain_name>`:: Specifies a unique name for the failure domain. The name is limited to 64 or fewer characters, which can include lower-case letters, digits, and a dash (`-`). The dash cannot be in the leading or ending position of the name.
46+
`<prism_element_name>`:: Optional. Specifies the name of the Prism Element.
47+
`<prism_element_uuid`>:: Specifies the UUID of the Prism Element.
48+
`<network_uuid`>:: Specifies the UUID of the Prism Element subnet object. The subnet's IP address prefix (CIDR) should contain the virtual IP addresses that the {product-title} cluster uses. Only one subnet per failure domain (Prism Element) in an {product-title} cluster is supported.
49+
50+
. As required, configure additional failure domains.
51+
. To distribute control plane and compute machines across the failure domains, do one of the following:
52+
53+
** If compute and control plane machines can share the same set of failure domains, add the failure domain names under the cluster's default machine configuration.
54+
+
55+
.Example of control plane and compute machines sharing a set of failure domains
56+
+
57+
[source,yaml]
58+
----
59+
apiVersion: v1
60+
baseDomain: example.com
61+
compute:
62+
# ...
63+
platform:
64+
nutanix:
65+
defaultMachinePlatform:
66+
failureDomains:
67+
- failure-domain-1
68+
- failure-domain-2
69+
- failure-domain-3
70+
# ...
71+
----
72+
** If compute and control plane machines must use different failure domains, add the failure domain names under the respective machine pools.
73+
+
74+
.Example of control plane and compute machines using different failure domains
75+
+
76+
[source,yaml]
77+
----
78+
apiVersion: v1
79+
baseDomain: example.com
80+
compute:
81+
# ...
82+
controlPlane:
83+
platform:
84+
nutanix:
85+
failureDomains:
86+
- failure-domain-1
87+
- failure-domain-2
88+
- failure-domain-3
89+
# ...
90+
compute:
91+
platform:
92+
nutanix:
93+
failureDomains:
94+
- failure-domain-1
95+
- failure-domain-2
96+
# ...
97+
----
98+
99+
. Save the file.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * installing/installing_nutanix/nutanix-failure-domains.adoc
4+
// * post_installation_configuration/adding-nutanix-failure-domains.adoc
5+
6+
:_mod-docs-content-type: CONCEPT
7+
[id="installation-nutanix-failure-domains-req_{context}"]
8+
= Failure domain requirements
9+
10+
When planning to use failure domains, consider the following requirements:
11+
12+
* All Nutanix Prism Element instances must be managed by the same instance of Prism Central. A deployment that is comprised of multiple Prism Central instances is not supported.
13+
* The machines that make up the Prism Element clusters must reside on the same Ethernet network for failure domains to be able to communicate with each other.
14+
* A subnet is required in each Prism Element that will be used as a failure domain in the {product-title} cluster. When defining these subnets, they must share the same IP address prefix (CIDR) and should contain the virtual IP addresses that the {product-title} cluster uses.
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * post_installation_configuration/adding-nutanix-failure-domains.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="post-installation-adding-nutanix-failure-domains-compute-machines_{context}"]
7+
= Distributing compute machines across failure domains
8+
9+
You can distribute compute machines across Nutanix failure domains by performing either of the following tasks:
10+
11+
* Modifying existing compute machine sets.
12+
* Creating new compute machine sets.
13+
14+
The following procedure details how to distribute compute machines across failure domains by modifying existing compute machine sets. For more information on creating a compute machine set, see "Additional resources".
15+
16+
.Prerequisites
17+
18+
* You have configured the failure domains in the cluster's Infrastructure custom resource (CR).
19+
20+
.Procedure
21+
22+
. Run the following command to view the cluster's Infrastructure CR.
23+
+
24+
[source,terminal]
25+
----
26+
$ oc describe infrastructures.config.openshift.io cluster
27+
----
28+
. For each failure domain (`platformSpec.nutanix.failureDomains`), note the cluster's UUID, name, and subnet object UUID. These values are required to add a failure domain to a compute machine set.
29+
. List the compute machine sets in your cluster by running the following command:
30+
+
31+
[source,terminal]
32+
----
33+
$ oc get machinesets -n openshift-machine-api
34+
----
35+
. Edit the first compute machine set by running the following command:
36+
+
37+
[source,terminal]
38+
----
39+
$ oc edit machineset <machineset_name> -n openshift-machine-api
40+
----
41+
. Configure the compute machine set to use the first failure domain by adding the following to the `spec.template.spec.providerSpec.value` stanza:
42+
+
43+
[NOTE]
44+
====
45+
Be sure that the values you specify for the `cluster` and `subnets` fields match the values that were configured in the `failureDomains` stanza in the cluster's Infrastructure CR.
46+
====
47+
+
48+
.Example compute machine set with Nutanix failure domains
49+
[source,yaml]
50+
----
51+
apiVersion: machine.openshift.io/v1
52+
kind: MachineSet
53+
metadata:
54+
creationTimestamp: null
55+
labels:
56+
machine.openshift.io/cluster-api-cluster: <cluster_name>
57+
name: <machineset_name>
58+
namespace: openshift-machine-api
59+
spec:
60+
replicas: 2
61+
# ...
62+
template:
63+
spec:
64+
# ...
65+
providerSpec:
66+
value:
67+
apiVersion: machine.openshift.io/v1
68+
failureDomain:
69+
name: <failure_domain_name_1>
70+
cluster:
71+
type: uuid
72+
uuid: <prism_element_uuid_1>
73+
subnets:
74+
- type: uuid
75+
uuid: <prism_element_network_uuid_1>
76+
# ...
77+
----
78+
. Note the value of `spec.replicas`, as you need it when scaling the machine set to apply the changes.
79+
. Save your changes.
80+
. List the machines that are managed by the updated compute machine set by running the following command:
81+
+
82+
[source,terminal]
83+
----
84+
$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
85+
----
86+
. For each machine that is managed by the updated compute machine set, set the `delete` annotation by running the following command:
87+
+
88+
[source,terminal]
89+
----
90+
$ oc annotate machine/<machine_name_original_1> \
91+
-n openshift-machine-api \
92+
machine.openshift.io/delete-machine="true"
93+
----
94+
. Scale the compute machine set to twice the number of replicas by running the following command:
95+
+
96+
[source,terminal]
97+
----
98+
$ oc scale --replicas=<twice_the_number_of_replicas> \// <1>
99+
machineset <machine_set_name> \
100+
-n openshift-machine-api
101+
----
102+
<1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `4`.
103+
. List the machines that are managed by the updated compute machine set by running the following command:
104+
+
105+
[source,terminal]
106+
----
107+
$ oc get -n openshift-machine-api machines -l machine.openshift.io/cluster-api-machineset=<machine_set_name>
108+
----
109+
+
110+
When the new machines are in the `Running` phase, you can scale the compute machine set to the original number of replicas.
111+
. Scale the compute machine set to the original number of replicas by running the following command:
112+
+
113+
[source,terminal]
114+
----
115+
$ oc scale --replicas=<original_number_of_replicas> \// <1>
116+
machineset <machine_set_name> \
117+
-n openshift-machine-api
118+
----
119+
<1> For example, if the original number of replicas in the compute machine set is `2`, scale the replicas to `2`.
120+
. As required, continue to modify machine sets to reference the additional failure domains that are available to the deployment.

0 commit comments

Comments
 (0)