|
3 | 3 | // * nodes/nodes-nodes-rebooting.adoc |
4 | 4 |
|
5 | 5 | [id="nodes-nodes-rebooting-infrastructure_{context}"] |
6 | | -= Understanding infrastructure node rebooting |
| 6 | += About rebooting nodes running critical infrastructure |
7 | 7 |
|
8 | | -Infrastructure nodes are nodes that are labeled to run pieces of the |
9 | | -{product-title} environment. Currently, the easiest way to manage node reboots |
10 | | -is to ensure that there are at least three nodes available to run |
11 | | -infrastructure. The nodes to run the infrastructure are called *master* nodes. |
| 8 | +When rebooting nodes that host critical {product-title} infrastructure components, such as router pods, registry pods, and monitoring pods, ensure that there are at least three nodes available to run these components. |
12 | 9 |
|
13 | | -The scenario below demonstrates a common mistake that can lead |
14 | | -to service interruptions for the applications running on {product-title} when |
15 | | -only two nodes are available. |
| 10 | +The following scenario demonstrates how service interruptions can occur with applications running on {product-title} when only two nodes are available: |
16 | 11 |
|
17 | 12 | - Node A is marked unschedulable and all pods are evacuated. |
18 | | -- The registry pod running on that node is now redeployed on node B. This means |
19 | | -node B is now running both registry pods. |
| 13 | +- The registry pod running on that node is now redeployed on node B. Node B is now running both registry pods. |
20 | 14 | - Node B is now marked unschedulable and is evacuated. |
21 | | -- The service exposing the two pod endpoints on node B, for a brief period of |
22 | | - time, loses all endpoints until they are redeployed to node A. |
| 15 | +- The service exposing the two pod endpoints on node B loses all endpoints, for a brief period of time, until they are redeployed to node A. |
23 | 16 |
|
24 | | -The same process using three master nodes for infrastructure does not result in a service |
25 | | -disruption. However, due to pod scheduling, the last node that is evacuated and |
26 | | -brought back in to rotation is left running zero registries. The other two nodes |
27 | | -will run two and one registries respectively. The best solution is to rely on |
28 | | -pod anti-affinity. |
| 17 | +When using three nodes for infrastructure components, this process does not result in a service disruption. However, due to pod scheduling, the last node that is evacuated and brought back into rotation does not have a registry pod. One of the other nodes has two registry pods. To schedule the third registry pod on the last node, use pod anti-affinity to prevent the scheduler from locating two registry pods on the same node. |
0 commit comments