From c66f54a41da009ad0b8536af7a06ec26894c74ee Mon Sep 17 00:00:00 2001 From: YuryHrytsuk Date: Wed, 2 Jul 2025 10:24:06 +0200 Subject: [PATCH 1/4] Update longhorn README Document how to perform (kubernetes) node maintenance --- charts/longhorn/README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/charts/longhorn/README.md b/charts/longhorn/README.md index 1bae02be..dcd8f5f9 100644 --- a/charts/longhorn/README.md +++ b/charts/longhorn/README.md @@ -54,3 +54,9 @@ Insights into LH's performance: Resource requirements: * https://github.com/longhorn/longhorn/issues/1691 + +### (Kubernetes) Node maintenance + +https://longhorn.io/docs/1.8.1/maintenance/maintenance/ + +Note: you can use Longhorn GUI to perform some operations From e3e871897a49bc4d6165fb1c034af0a4c398c75b Mon Sep 17 00:00:00 2001 From: YuryHrytsuk Date: Wed, 2 Jul 2025 11:36:17 +0200 Subject: [PATCH 2/4] Update Longhorn README: disks config and maintenance --- charts/longhorn/README.md | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/charts/longhorn/README.md b/charts/longhorn/README.md index dcd8f5f9..6819a858 100644 --- a/charts/longhorn/README.md +++ b/charts/longhorn/README.md @@ -27,7 +27,15 @@ Source: ### How to configure disks for LH -As of now, we follow the same approach we use for `/docker` folder (via ansible playbook) but we use `/longhorn` folder name +Manual configuration performed (to be moved to ansible) +1. Create partition on the disk + * e.g. via using `fdisk` https://phoenixnap.com/kb/linux-create-partition +2. Format partition as XFS + * `sudo mkfs.xfs -f /dev/sda1` +3. Mount partition `sudo mount -t xfs /dev/sda1 /longhorn` +4. Persist mount in `/etc/fstab` by adding line + * `UUID= /longhorn xfs pquota 0 0` + * UUID can be received from `lsblk -f` Issue asking LH to clearly document requirements: https://github.com/longhorn/longhorn/issues/11125 @@ -60,3 +68,14 @@ Resource requirements: https://longhorn.io/docs/1.8.1/maintenance/maintenance/ Note: you can use Longhorn GUI to perform some operations + +### Zero downtime updating longhorn disks (procedure) + +1. Go to LH GUI and select a Node + 1. Disable scheduling + 2. Request eviction +1. Remove disk from the node + * If remove icon is disabled, disable eviction on disk to enable the remove button +2. Perform disks updates on the node +3. Make sure LH didn't pick up wrongly configured disk in the meantime and remove the wrong disk if it did so +4. Wait till LH automatically adds the disk to the Node From ccd5a5187726236c186d231b50b133ea1f1d307d Mon Sep 17 00:00:00 2001 From: YuryHrytsuk Date: Thu, 3 Jul 2025 15:51:26 +0200 Subject: [PATCH 3/4] Kubernets add local storage Use topolvm as the most mature local storage csi. --- charts/topolvm/README.md | 43 ++++++++++++ charts/topolvm/values.yaml.gotmpl | 106 ++++++++++++++++++++++++++++++ 2 files changed, 149 insertions(+) create mode 100644 charts/topolvm/README.md create mode 100644 charts/topolvm/values.yaml.gotmpl diff --git a/charts/topolvm/README.md b/charts/topolvm/README.md new file mode 100644 index 00000000..849df697 --- /dev/null +++ b/charts/topolvm/README.md @@ -0,0 +1,43 @@ +## topolvm components and architecture +See diagram https://github.com/topolvm/topolvm/blob/topolvm-chart-v15.5.5/docs/design.md + +## Preqrequisites +`topolvm` does not automatically creates Volume Groups (specified in device-classes). This needs to be configured additionally (e.g. manually, via ansible, ...) + +Manual example (Ubuntu 22.04): +1. Create partition to use later (`sudo fdisk /dev/sda`) +2. Create PV (`sudo pvcreate /dev/sda2`) + * Prerequisite: `sudo apt install lvm2` +3. Create Volume group (`sudo vgcreate topovg-sdd /dev/sda2`) + * Note: Volume group's name must correspond to the setting of `volume-group` inside `lvmd.deviceClasses` +4. Check volume group (`sudo vgdisplay`) + +Source: https://github.com/topolvm/topolvm/blob/topolvm-chart-v15.5.5/docs/getting-started.md#prerequisites + +## Deleting PV(C)s with `retain` reclaim policy +1. Delete release (e.g. helm uninstall -n test test) +2. Find LogicalVolume CR (`kubectl get logicalvolumes.topolvm.io` +3. Delete LogicalVolume CR (`kubectl delete logicalvolumes.topolvm.io `) +4. Delete PV (`kubectl delete PV `) + +## Backup / Snapshotting +1. Only possible while using thin provisioning +2. We use thick (non-thin provisioned) volumes --> no snapshot support + + Track this feature request for changes https://github.com/topolvm/topolvm/issues/1070 + +Note: there might be alternative not documented ways (e.g. via Velero) + +## Resizing PVs +1. Update storage capacity in configuration +2. Deploy changes + +Note: storage size can only be increased. Otherwise, one gets `Forbidden: field can not be less than previous value` error + +## Node maintenance + +Read https://github.com/topolvm/topolvm/blob/topolvm-chart-v15.5.5/docs/node-maintenance.md + +## Using topolvm. Notes +* `topolvm` may not work with pods that define `spec.nodeName` Use node affinity instead + https://github.com/topolvm/topolvm/blob/main/docs/faq.md#the-pod-does-not-start-when-nodename-is-specified-in-the-pod-spec diff --git a/charts/topolvm/values.yaml.gotmpl b/charts/topolvm/values.yaml.gotmpl new file mode 100644 index 00000000..216d54ef --- /dev/null +++ b/charts/topolvm/values.yaml.gotmpl @@ -0,0 +1,106 @@ +lvmd: + # set up lvmd service with DaemonSet + managed: true + + # device classes (VGs) need to be created outside of topolvm (e.g. manually, via ansible, ...) + deviceClasses: + - name: ssd + volume-group: topovg-sdd + default: true + spare-gb: 5 + +storageClasses: + - name: {{ .Values.topolvmStorageClassName }} + storageClass: + # Want to use non-default device class? + # See configuration example in + # https://github.com/topolvm/topolvm/blob/topolvm-chart-v15.5.5/docs/snapshot-and-restore.md#set-up-a-storage-class + + fsType: xfs + isDefaultClass: false + # volumeBindingMode can be either WaitForFirstConsumer or Immediate. WaitForFirstConsumer is recommended because TopoLVM cannot schedule pods wisely if volumeBindingMode is Immediate. + volumeBindingMode: WaitForFirstConsumer + allowVolumeExpansion: true + # NOTE: On removal requires manual clean up of PVs, LVMs + # and Logical Volumes (CR logicalvolumes.topolvm.io). + # Removal Logical Volume (CR) would clean up the LVM on the node, + # but PV has still to be removed manually. + # Read more: https://github.com/topolvm/topolvm/blob/topolvm-chart-v15.5.5/docs/advanced-setup.md#storageclass + reclaimPolicy: Retain + +resources: + topolvm_node: + requests: + memory: 100Mi + cpu: 100m + limits: + memory: 500Mi + cpu: 500m + + topolvm_controller: + requests: + memory: 50Mi + cpu: 50m + limits: + memory: 200Mi + cpu: 200m + + lvmd: + requests: + memory: 100Mi + cpu: 100m + limits: + memory: 500Mi + cpu: 500m + + csi_registrar: + requests: + cpu: 25m + memory: 10Mi + limits: + cpu: 200m + memory: 200Mi + + csi_provisioner: + requests: + memory: 50Mi + cpu: 50m + limits: + memory: 200Mi + cpu: 200m + + csi_resizer: + requests: + memory: 50Mi + cpu: 50m + limits: + memory: 200Mi + cpu: 200m + + csi_snapshotter: + requests: + memory: 50Mi + cpu: 50m + limits: + memory: 200Mi + cpu: 200m + + liveness_probe: + requests: + cpu: 25m + memory: 10Mi + limits: + cpu: 200m + memory: 200Mi + +# https://github.com/topolvm/topolvm/blob/topolvm-chart-v15.5.5/docs/topolvm-scheduler.md +scheduler: + # start simple + enabled: false + +cert-manager: + # start simple + enabled: false + +snapshot: + enabled: true From 356b6bcdd0c9eab370695184d93f25609d1a3d64 Mon Sep 17 00:00:00 2001 From: YuryHrytsuk Date: Thu, 3 Jul 2025 15:59:05 +0200 Subject: [PATCH 4/4] Update longhorn readme --- charts/longhorn/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/charts/longhorn/README.md b/charts/longhorn/README.md index 6819a858..fb26f649 100644 --- a/charts/longhorn/README.md +++ b/charts/longhorn/README.md @@ -70,6 +70,8 @@ https://longhorn.io/docs/1.8.1/maintenance/maintenance/ Note: you can use Longhorn GUI to perform some operations ### Zero downtime updating longhorn disks (procedure) +Notes: +* Update one node at a time so that other nodes can still serve data 1. Go to LH GUI and select a Node 1. Disable scheduling