Skip to content

Commit 53d1215

Browse files
Merge pull request openshift#7664 from shiftstack/rootvol-etcd-doc
OSASINFRA-3280: openstack: document etcd on local disk
2 parents 75bca37 + 7e50033 commit 53d1215

File tree

2 files changed

+232
-0
lines changed

2 files changed

+232
-0
lines changed

docs/user/openstack/customization.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ Beyond the [platform-agnostic `install-config.yaml` properties](../customization
6161
> **Note**
6262
> Note when deploying with `Kuryr` there is an Octavia API loadbalancer VM that will not fulfill the Availability Zones restrictions due to Octavia lack of support for it. In addition, if Octavia only has the amphora provider instead of also the OVN-Octavia provider, all the OpenShift services will be backed up by Octavia Load Balancer VMs which will not fulfill the Availability Zone restrictions either.
6363
64+
> **Note**
65+
> Note when deploying the control-plane machines with `rootVolume`, it is highly suggested to use an [additional ephemeral disk dedicated to etcd](./etcd-ephemeral-disk.md).
6466
6567
## Examples
6668

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# Moving etcd to an ephemeral local disk
2+
3+
You can move etcd from a root volume (Cinder) to a dedicated ephemeral local disk to prevent or resolve performance issues.
4+
5+
## Prerequisites
6+
7+
* This migration is currently tested and documented as a day 2 operation.
8+
* An OpenStack cloud where Nova is configured to use local storage for ephemeral disks. The `libvirt.images_type` option in `nova.conf` must not be `rbd`.
9+
* An OpenStack cloud with Cinder being functional and enough available storage to accommodate 3 Root Volumes for the OpenShift control plane.
10+
* OpenShift will be deployed with IPI for now; UPI is not yet documented but technically possible.
11+
* The control-plane machine’s auxiliary storage device, such as /dev/vdb, must match the vdb. Change this reference in all places in the file.
12+
13+
## Procedure
14+
15+
* Create a Nova flavor for the Control Plane which allows 10 GiB of Ephemeral Disk:
16+
17+
```bash
18+
openstack flavor create --ephemeral 10 [...]
19+
```
20+
21+
* We will deploy a cluster with Root Volumes for the Control Plane. Here is an example of `install-config.yaml`:
22+
23+
```yaml
24+
[...]
25+
controlPlane:
26+
name: master
27+
platform:
28+
openstack:
29+
type: ${CONTROL_PLANE_FLAVOR}
30+
rootVolume:
31+
size: 100
32+
types:
33+
- ${CINDER_TYPE}
34+
replicas: 3
35+
[...]
36+
```
37+
38+
* Run openshift-install with the following parameters to create the cluster:
39+
40+
```bash
41+
openshift-install create cluster --dir=install_dir
42+
```
43+
44+
* Once the cluster has been deployed and is healthy, edit the ControlPlaneMachineSet (CPMS) to add the additional block ephemeral device that will be used by etcd:
45+
46+
```bash
47+
oc patch ControlPlaneMachineSet/cluster -n openshift-machine-api --type json -p '[{"op": "add", "path": "/spec/template/machines_v1beta1_machine_openshift_io/spec/providerSpec/value/additionalBlockDevices", "value": [{"name": "etcd", "sizeGiB": 10, "storage": {"type": "Local"}}]}]'
48+
```
49+
50+
> [!NOTE]
51+
> Putting etcd on a block device of type Volume is not supported for performance reasons simply because we don't test it.
52+
> While it's functionally the same as using the root volume, we decided to support local devices only for now.
53+
54+
* Wait for the control-plane to roll out with new Machines. A few commands can be used to check that everything is healthy:
55+
56+
```bash
57+
oc wait --timeout=90m --for=condition=Progressing=false controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
58+
oc wait --timeout=90m --for=jsonpath='{.spec.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
59+
oc wait --timeout=90m --for=jsonpath='{.status.updatedReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
60+
oc wait --timeout=90m --for=jsonpath='{.status.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
61+
oc wait --timeout=90m --for=jsonpath='{.status.readyReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
62+
oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false
63+
```
64+
65+
* Check that we have 3 control plane machines, and that each machine has the additional block device:
66+
67+
```bash
68+
cp_machines=$(oc get machines -n openshift-machine-api --selector='machine.openshift.io/cluster-api-machine-role=master' --no-headers -o custom-columns=NAME:.metadata.name)
69+
if [[ $(echo "${cp_machines}" | wc -l) -ne 3 ]]; then
70+
exit 1
71+
fi
72+
for machine in ${cp_machines}; do
73+
if ! oc get machine -n openshift-machine-api "${machine}" -o jsonpath='{.spec.providerSpec.value.additionalBlockDevices}' | grep -q 'etcd'; then
74+
exit 1
75+
fi
76+
done
77+
```
78+
79+
* We will use a MachineConfig to handle etcd on local disk. Create a file named `98-var-lib-etcd.yaml` with this content:
80+
81+
```yaml
82+
apiVersion: machineconfiguration.openshift.io/v1
83+
kind: MachineConfig
84+
metadata:
85+
labels:
86+
machineconfiguration.openshift.io/role: master
87+
name: 98-var-lib-etcd
88+
spec:
89+
config:
90+
ignition:
91+
version: 3.2.0
92+
systemd:
93+
units:
94+
- contents: |
95+
[Unit]
96+
Description=Make File System on /dev/vdb
97+
DefaultDependencies=no
98+
BindsTo=dev-vdb.device
99+
After=dev-vdb.device var.mount
100+
101+
102+
[Service]
103+
Type=oneshot
104+
RemainAfterExit=yes
105+
ExecStart=/usr/sbin/mkfs.xfs -f /dev/vdb
106+
TimeoutSec=0
107+
108+
[Install]
109+
WantedBy=var-lib-containers.mount
110+
enabled: true
111+
112+
- contents: |
113+
[Unit]
114+
Description=Mount /dev/vdb to /var/lib/etcd
115+
Before=local-fs.target
116+
117+
118+
119+
[Mount]
120+
What=/dev/vdb
121+
Where=/var/lib/etcd
122+
Type=xfs
123+
Options=defaults,prjquota
124+
125+
[Install]
126+
WantedBy=local-fs.target
127+
enabled: true
128+
name: var-lib-etcd.mount
129+
- contents: |
130+
[Unit]
131+
Description=Sync etcd data if new mount is empty
132+
DefaultDependencies=no
133+
After=var-lib-etcd.mount var.mount
134+
Before=crio.service
135+
136+
[Service]
137+
Type=oneshot
138+
RemainAfterExit=yes
139+
ExecCondition=/usr/bin/test ! -d /var/lib/etcd/member
140+
ExecStart=/usr/sbin/setenforce 0
141+
ExecStart=/bin/rsync -ar /sysroot/ostree/deploy/rhcos/var/lib/etcd/ /var/lib/etcd/
142+
ExecStart=/usr/sbin/setenforce 1
143+
TimeoutSec=0
144+
145+
[Install]
146+
WantedBy=multi-user.target graphical.target
147+
enabled: true
148+
name: sync-var-lib-etcd-to-etcd.service
149+
- contents: |
150+
[Unit]
151+
Description=Restore recursive SELinux security contexts
152+
DefaultDependencies=no
153+
After=var-lib-etcd.mount
154+
Before=crio.service
155+
156+
[Service]
157+
Type=oneshot
158+
RemainAfterExit=yes
159+
ExecStart=/sbin/restorecon -R /var/lib/etcd/
160+
TimeoutSec=0
161+
162+
[Install]
163+
WantedBy=multi-user.target graphical.target
164+
enabled: true
165+
name: restorecon-var-lib-etcd.service
166+
```
167+
168+
* Apply this file that will create the device and sync the data by entering the following command:
169+
170+
```bash
171+
oc create -f 98-var-lib-etcd.yaml
172+
```
173+
174+
* This will take some time to complete, as the etcd data will be synced from the root volume to the local disk on
175+
the control-plane machines. Run these commands to check whether the cluster is healthy:
176+
177+
```bash
178+
oc wait --timeout=45m --for=condition=Updating=false machineconfigpool/master
179+
oc wait node --selector='node-role.kubernetes.io/master' --for condition=Ready --timeout=30s
180+
oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false
181+
```
182+
183+
184+
* Once the cluster is healthy, create a file named `etcd-replace.yaml` with this content:
185+
186+
```yaml
187+
apiVersion: machineconfiguration.openshift.io/v1
188+
kind: MachineConfig
189+
metadata:
190+
labels:
191+
machineconfiguration.openshift.io/role: master
192+
name: 98-var-lib-etcd
193+
spec:
194+
config:
195+
ignition:
196+
version: 3.2.0
197+
systemd:
198+
units:
199+
- contents: |
200+
[Unit]
201+
Description=Mount /dev/vdb to /var/lib/etcd
202+
Before=local-fs.target
203+
204+
205+
206+
[Mount]
207+
What=/dev/vdb
208+
Where=/var/lib/etcd
209+
Type=xfs
210+
Options=defaults,prjquota
211+
212+
[Install]
213+
WantedBy=local-fs.target
214+
enabled: true
215+
name: var-lib-etcd.mount
216+
```
217+
218+
Apply this file that will remove the logic for creating and syncing the device by entering the following command:
219+
220+
```bash
221+
oc replace -f etcd-replace.yaml
222+
```
223+
224+
* Again we need to wait for the cluster to be healthy. The same commands as above can be used to check that everything is healthy.
225+
226+
* Now etcd is stored on ephemeral local disk. This can be verified by connected to a master nodes with `oc debug node/<master-node-name>` and running the following commands:
227+
228+
```bash
229+
oc debug node/<master-node-name> -- df -T /host/var/lib/etcd
230+
```

0 commit comments

Comments
 (0)