Skip to content

Commit 2f9f50b

Browse files
authored
Merge pull request #2788 from replicatedhq/114634
Create topic for adding and managing nodes with EC
2 parents c74a968 + f9c07ff commit 2f9f50b

File tree

7 files changed

+218
-148
lines changed

7 files changed

+218
-148
lines changed
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Managing Multi-Node Clusters with Embedded Cluster
2+
3+
The topic describes managing nodes in clusters created with Replicated Embedded Cluster, including how to add nodes and enable high-availability for multi-node clusters.
4+
5+
## Limitations
6+
7+
Multi-node clusters with Embedded Cluster have the following limitations:
8+
9+
* Support for multi-node clusters with Embedded Cluster is Beta. Only single-node embedded clusters are Generally Available (GA).
10+
11+
* High availability for Embedded Cluster in an Alpha feature. This feature is subject to change, including breaking changes. To get access to this feature, reach out to Alex Parker at [[email protected]](mailto:[email protected]).
12+
13+
## Add Nodes to a Cluster (Beta) {#add-nodes}
14+
15+
You can add nodes to create a multi-node cluster in online (internet-connected) and air-gapped (limited or no outbound internet access) environments. The Admin Console provides the join command that you use to join nodes to the cluster.
16+
17+
:::note
18+
Multi-node clusters are not highly available by default. For information about enabling high availability, see [Enable High Availability for Multi-Node Clusters (Alpha)](#ha) below.
19+
:::
20+
21+
To add nodes to a cluster:
22+
23+
1. (Optional) In the Embedded Cluster Config, configure the `roles` key to customize node roles. For more information, see [roles](/reference/embedded-config#roles) in _Embedded Cluster Config_. When you are done, create and promote a new release with the updated Config.
24+
25+
1. Do one of the following to get the join command from the Admin Console:
26+
27+
1. To add nodes during the application installation process, follow the steps in [Online Installation with Embedded Cluster](/enterprise/installing-embedded) or [Air Gap Installation with Embedded Cluster](/enterprise/installing-embedded-air-gap) to install. A **Nodes** screen is displayed as part of the installation flow in the Admin Console that allows you to choose a node role and copy the relevant join command.
28+
29+
1. Otherwise, if you have already installed the application:
30+
31+
1. Log in to the Admin Console.
32+
33+
1. If you promoted a new release that configures the `roles` key in the Embedded Cluster Config, update the instance to the new version. See [Performing Updates in Embedded Clusters](/enterprise/updating-embedded).
34+
35+
1. Go to **Cluster Management > Add node** at the top of the page.
36+
37+
<img alt="Add node page in the Admin Console" src="/images/admin-console-add-node.png" width="600px"/>
38+
39+
[View a larger version of this image](/images/admin-console-add-node.png)
40+
41+
1. Either on the Admin Console **Nodes** screen that is displayed during installation or in the **Add a Node** dialog, select one or more roles for the new node that you will join. Copy the join command.
42+
43+
Note the following:
44+
45+
* If the Embedded Cluster Config [roles](/reference/embedded-config#roles) key is not configured, all new nodes joined to the cluster are assigned the `controller` role by default. The `controller` role designates nodes that run the Kubernetes control plane. Controller nodes can also run other workloads, such as application or Replicated KOTS workloads.
46+
47+
* Roles are not updated or changed after a node is added. If you need to change a node’s role, reset the node and add it again with the new role.
48+
49+
* For multi-node clusters with high availability (HA), at least three `controller` nodes are required. You can assign both the `controller` role and one or more `custom` roles to the same node. For more information about creating HA clusters with Embedded Cluster, see [Enable High Availability for Multi-Node Clusters (Alpha)](#ha) below.
50+
51+
* To add non-controller or _worker_ nodes that do not run the Kubernetes control plane, select one or more `custom` roles for the node and deselect the `controller` role.
52+
53+
1. Do one of the following to make the Embedded Cluster installation assets available on the machine that you will join to the cluster:
54+
55+
* **For online (internet-connected) installations**: SSH onto the machine that you will join. Then, use the same commands that you ran during installation to download and untar the Embedded Cluster installation assets on the machine. See [Online Installation with Embedded Cluster](/enterprise/installing-embedded).
56+
57+
* **For air gap installations with limited or no outbound internet access**: On a machine that has internet access, download the Embedded Cluster installation assets (including the air gap bundle) using the same command that you ran during installation. See [Air Gap Installation with Embedded Cluster](/enterprise/installing-embedded-air-gap). Then, move the downloaded assets to the air-gapped machine that you will join, and untar.
58+
59+
:::important
60+
The Embedded Cluster installation assets on each node must all be the same version. If you use a different version than what is installed elsewhere in the cluster, the cluster will not be stable. To download a specific version of the Embedded Cluster assets, select a version in the **Embedded cluster install instructions** dialog.
61+
:::
62+
63+
1. On the machine that you will join to the cluster, run the join command that you copied from the Admin Console.
64+
65+
**Example:**
66+
67+
```bash
68+
sudo ./APP_SLUG join 10.128.0.32:30000 TxXboDstBAamXaPdleSK7Lid
69+
```
70+
**Air Gap Example:**
71+
72+
```bash
73+
sudo ./APP_SLUG join --airgap-bundle APP_SLUG.airgap 10.128.0.32:30000 TxXboDstBAamXaPdleSK7Lid
74+
```
75+
76+
1. In the Admin Console, either on the installation **Nodes** screen or on the **Cluster Management** page, verify that the node appears. Wait for the node's status to change to Ready.
77+
78+
1. Repeat these steps for each node you want to add.
79+
80+
## Enable High Availability for Multi-Node Clusters (Alpha) {#ha}
81+
82+
Multi-node clusters are not highly available by default. The first node of the cluster is special and holds important data for Kubernetes and KOTS, such that the loss of this node would be catastrophic for the cluster. Enabling high availability (HA) requires that at least three controller nodes are present in the cluster. Users can enable HA when joining the third node.
83+
84+
:::important
85+
High availability for Embedded Cluster in an Alpha feature. This feature is subject to change, including breaking changes. To get access to this feature, reach out to Alex Parker at [[email protected]](mailto:[email protected]).
86+
:::
87+
88+
### Requirements
89+
90+
Enabling high availability has the following requirements:
91+
92+
* High availability is supported with Embedded Cluster 1.4.1 or later.
93+
94+
* High availability is supported only for clusters where at least three nodes with the `controller` role are present.
95+
96+
### Limitations
97+
98+
Enabling high availability has the following limitations:
99+
100+
* High availability for Embedded Cluster in an Alpha feature. This feature is subject to change, including breaking changes. To get access to this feature, reach out to Alex Parker at [[email protected]](mailto:[email protected]).
101+
102+
* The `--enable-ha` flag serves as a feature flag during the Alpha phase. In the future, the prompt about migrating to high availability will display automatically if the cluster is not yet HA and you are adding the third or more controller node.
103+
104+
* HA multi-node clusters use rqlite to store support bundles up to 100 MB in size. Bundles over 100 MB can cause rqlite to crash and restart.
105+
106+
### Best Practices for High Availability
107+
108+
Consider the following best practices and recommendations for creating HA clusters:
109+
110+
* At least three _controller_ nodes that run the Kubernetes control plane are required for HA. This is because clusters use a quorum system, in which more than half the nodes must be up and reachable. In clusters with three controller nodes, the Kubernetes control plane can continue to operate if one node fails because a quorum can still be reached by the remaining two nodes. By default, with Embedded Cluster, all new nodes added to a cluster are controller nodes. For information about customizing the `controller` node role, see [roles](/reference/embedded-config#roles) in _Embedded Cluster Config_.
111+
112+
* Always use an odd number of controller nodes in HA clusters. Using an odd number of controller nodes ensures that the cluster can make decisions efficiently with quorum calculations. Clusters with an odd number of controller nodes also avoid split-brain scenarios where the cluster runs as two, independent groups of nodes, resulting in inconsistencies and conflicts.
113+
114+
* You can have any number of _worker_ nodes in HA clusters. Worker nodes do not run the Kubernetes control plane, but can run workloads such as application or Replicated KOTS workloads.
115+
116+
### Create a Multi-Node HA Cluster
117+
118+
To create a multi-node HA cluster:
119+
120+
1. Set up a cluster with at least two controller nodes. You can do an online (internet-connected) or air gap installation. For more information, see [Online Installation with Embedded Cluster](/enterprise/installing-embedded) or [Air Gap Installation with Embedded Cluster](/enterprise/installing-embedded-air-gap).
121+
122+
1. SSH onto a third node that you want to join to the cluster as a controller.
123+
124+
1. Run the join command provided in the Admin Console **Cluster Management** tab and pass the `--enable-ha` flag. For example:
125+
126+
```bash
127+
sudo ./APP_SLUG join --enable-ha 10.128.0.80:30000 tI13KUWITdIerfdMcWTA4Hpf
128+
```
129+
130+
1. After the third node joins the cluster, type `y` in response to the prompt asking if you want to enable high availability.
131+
132+
![high availability command line prompt](/images/embedded-cluster-ha-prompt.png)
133+
[View a larger version of this image](/images/embedded-cluster-ha-prompt.png)
134+
135+
1. Wait for the migration to complete.

docs/enterprise/installing-embedded-air-gap.mdx

Lines changed: 4 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,9 @@ To install with Embedded Cluster in an air gap environment:
123123

124124
1. On the login page, enter the Admin Console password that you created during installation and click **Log in**.
125125

126-
1. On the **Configure the cluster** screen, optionally add nodes to the cluster before deploying the application. Click **Continue**.
126+
1. On the **Nodes** page, you can view details about the machine where you installed, including its node role, status, CPU, and memory.
127+
128+
Optionally, add nodes to the cluster before deploying the application. For more information about joining nodes, see [Managing Multi-Node Clusters with Embedded Cluster](/enterprise/embedded-manage-nodes). Click **Continue**.
127129

128130
1. On the **Configure [App Name]** screen, complete the fields for the application configuration options. Click **Continue**.
129131

@@ -137,32 +139,4 @@ On the Admin Console dashboard, the application status changes from Missing to U
137139

138140
![Admin console dashboard showing ready status](/images/gitea-ec-ready.png)
139141

140-
[View a larger version of this image](/images/gitea-ec-ready.png)
141-
142-
## Add Nodes to Air Gap Clusters
143-
144-
You can add nodes to an air gap cluster. This involves downloading the Embedded Cluster assets to each node, copying a join command from the Admin Console, and running the join command on each node.
145-
146-
### Prerequisites
147-
148-
The Embedded Cluster binary and the air gap bundle must be present on each node you want to join to the cluster. Use the same commands as you did during installation to download and untar these assets on each node. For more information, see [Install](#install) above.
149-
150-
:::note
151-
The binary and air gap bundles on each additional node must be the same version as what is currently installed. To download a specific version of these assets, you can select a version in the **Embedded Cluster install instructions** dialog. For more information, see [Install](#install) above.
152-
:::
153-
154-
### Add Nodes
155-
156-
To add nodes to a cluster after successfully installing the first node:
157-
158-
1. Click the link in the install output to access the Admin Console. Proceed through the setup steps until you reach the **Nodes** page.
159-
160-
1. Click **Add node**, choose one or more node roles (if present), and copy the join command.
161-
162-
1. SSH onto another machine you want to join to the cluster. Run the join command on this node. For example:
163-
164-
```bash
165-
sudo ./APP_SLUG join --airgap-bundle APP_SLUG.airgap 10.128.0.32:30000 TxXboDstBAamXaPdleSK7Lid
166-
```
167-
168-
1. When you have finished adding all nodes, return to the Admin Console and click **Continue**.
142+
[View a larger version of this image](/images/gitea-ec-ready.png)

docs/enterprise/installing-embedded.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,9 @@ To install an application with Embedded Cluster:
8585

8686
1. On the login page, enter the Admin Console password that you created during installation and click **Log in**.
8787

88-
1. On the **Configure the cluster** screen, optionally add nodes to the cluster before deploying the application. Click **Continue**.
88+
1. On the **Nodes** page, you can view details about the machine where you installed, including its node role, status, CPU, and memory.
89+
90+
Optionally, add nodes to the cluster before deploying the application. For more information about joining nodes, see [Managing Multi-Node Clusters with Embedded Cluster](/enterprise/embedded-manage-nodes). Click **Continue**.
8991

9092
1. On the **Configure [App Name]** screen, complete the fields for the application configuration options. Click **Continue**.
9193

docs/reference/embedded-config.mdx

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -71,17 +71,28 @@ For a full list of versions, see the Embedded Cluster [releases page](https://gi
7171
7272
## roles
7373
74-
You can define node roles in the Embedded Cluster Config. In multi-node clusters, roles are used to determine which nodes run the Kubernetes control plane, and to assign application workloads to particular nodes. One or more roles can be selected and assigned to a node when it is joined to the cluster.
74+
You can optionally customize node roles in the Embedded Cluster Config using the `roles` key.
7575

76-
:::note
77-
Roles are not updated or changed after a node is added. If you need to change a node’s role, reset the node and add it again with the correct role.
78-
:::
76+
If the `roles` key is configured, users select one or more roles to assign to a node when it is joined to the cluster. A single node can be assigned:
77+
* The `controller` role, which designates nodes that run the Kubernetes control plane
78+
* One or more `custom` roles
79+
* Both the `controller` role _and_ one or more `custom` roles
80+
81+
For more information about how to assign node roles in the Admin Console, see [Managing Multi-Node Clusters with Embedded Cluster](/enterprise/embedded-manage-nodes).
82+
83+
If the `roles` key is _not_ configured, all nodes joined to the cluster are assigned the `controller` role. The `controller` role designates nodes that run the Kubernetes control plane. Controller nodes can also run other workloads, such as application or Replicated KOTS workloads.
84+
85+
For more information, see the sections below.
7986

8087
### controller
8188

82-
The controller role is required in any cluster. Nodes with this role are “controller workers” because they run the control plane and can run other workloads too. The first node in a cluster will always have the controller role because a cluster needs a control plane. Any node that doesn't have the controller role is a worker node.
89+
By default, all nodes joined to a cluster are assigned the `controller` role.
8390

84-
By default, the controller role is called “controller.” You can customize the name of the controller role with the `spec.roles.controller.name` field, like this:
91+
You can customize the `controller` role in the following ways:
92+
* Change the `name` that is assigned to controller nodes. By default, controller nodes are named “controller”. If you plan to create any `custom` roles, Replicated recommends that you change the default name for the `controller` role to a term that is easy to understand, such as "management". This is because, when you add `custom` roles, both the name of the `controller` role and the names of any `custom` roles are displayed to the user when they join a node.
93+
* Add one or more `labels` to be assigned to all controller nodes. See [labels](#labels).
94+
95+
#### Example
8596

8697
```yaml
8798
apiVersion: embeddedcluster.replicated.com/v1beta1
@@ -90,13 +101,17 @@ spec:
90101
roles:
91102
controller:
92103
name: management
104+
labels:
105+
management: "true" # Label applied to "management" nodes
93106
```
94107

95108
### custom
96109

97-
You can define custom roles for other purposes in the cluster. This is particularly useful when combined with labels.
110+
You can add `custom` roles that users can assign to one or more nodes in the cluster. Each `custom` role that you add must have a `name` and can also have one or more `labels`. See [labels](#labels).
98111

99-
Custom roles are defined with the `spec.roles.custom` array, as shown in the example below:
112+
Adding `custom` node roles is useful if you need to assign application workloads to specific nodes in multi-node clusters. For example, if your application has graphics processing unit (GPU) workloads, you could create a `custom` role that will add a `gpu=true` label to any node that is assigned the role. This allows you to then schedule GPU workloads on nodes labled `gpu=true`. Or, if your application includes any resource-intensive workloads (such as a database) that must be run on dedicated nodes, you could create a `custom` role that adds a `db=true` label to the node. This way, the database workload could be assigned to a certain node or nodes.
113+
114+
#### Example
100115

101116
```yaml
102117
apiVersion: embeddedcluster.replicated.com/v1beta1
@@ -105,13 +120,15 @@ spec:
105120
roles:
106121
custom:
107122
- name: app
123+
labels:
124+
app: "true" # Label applied to "app" nodes
108125
```
109126

110127
### labels
111128

112-
Roles can have associated Kubernetes labels that are applied to any node in the cluster that is assigned that role. This is useful for things like assigning workloads to nodes.
129+
You can define Kubernetes labels for the default `controller` role and any `custom` roles that you add. When `labels` are defined, Embedded Cluster applies the label to any node in the cluster that is assigned the given role. Labels are useful for tasks like assigning workloads to nodes.
113130

114-
Labels are defined for the controller role and custom roles, as shown in the example below:
131+
#### Example
115132

116133
```yaml
117134
apiVersion: embeddedcluster.replicated.com/v1beta1
@@ -123,9 +140,12 @@ spec:
123140
labels:
124141
management: "true" # Label applied to "management" nodes
125142
custom:
126-
- name: app
143+
- name: db
127144
labels:
128-
app: "true" # Label applied to "app" nodes
145+
db: "true" # Label applied to "db" nodes
146+
- name: gpu
147+
labels:
148+
gpu: "true" # Label applied to "gpu" nodes
129149
```
130150

131151
## extensions

docs/vendor/embedded-disaster-recovery.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ To restore from a backup:
163163
![Restore from detected backup prompt on the command line](/images/dr-restore-admin-console-url.png)
164164
[View a larger version of this image](/images/dr-restore-admin-console-url.png)
165165

166-
1. (Optional) If the cluster should have multiple nodes, go to the Admin Console to get a join command and join additional nodes to the cluster. For more information, see [Add Nodes](/vendor/embedded-overview#add-nodes).
166+
1. (Optional) If the cluster should have multiple nodes, go to the Admin Console to get a join command and join additional nodes to the cluster. For more information, see [Managing Multi-Node Clusters with Embedded Cluster](/enterprise/embedded-manage-nodes).
167167

168168
1. Type `continue` when you are ready to proceed with the restore process.
169169

0 commit comments

Comments
 (0)