|
| 1 | +--- |
| 2 | +title: "Robust RKE2 Backups: Syncing etcd Snapshots to NFS for Safety" |
| 3 | +date: 2025-04-10T00:00:00-05:00 |
| 4 | +draft: true |
| 5 | +tags: ["rke2", "etcd", "backup", "nfs", "kubernetes"] |
| 6 | +categories: |
| 7 | +- Kubernetes |
| 8 | +- RKE2 |
| 9 | +- Backup |
| 10 | +author: "Matthew Mattox - mmattox@support.tools" |
| 11 | +description: "Use inotify-based detection to automatically sync RKE2 etcd snapshots to an NFS share using a Kubernetes DaemonSet." |
| 12 | +more_link: "yes" |
| 13 | +url: "/rke2-etcd-nfs-sync/" |
| 14 | +--- |
| 15 | + |
| 16 | +If you're not pushing your RKE2 etcd snapshots to a remote location like S3, you're effectively backing up to the same node you're trying to protect. This creates a critical failure point—if that etcd node dies, your snapshots are gone with it. |
| 17 | + |
| 18 | +This guide shows you how to fix that by: |
| 19 | +- Mounting an NFS share as a PVC, |
| 20 | +- Watching the etcd snapshot directory for new files using `inotifywait`, and |
| 21 | +- Automatically syncing those snapshots off-node via `rsync`. |
| 22 | + |
| 23 | +<!--more--> |
| 24 | + |
| 25 | +# [Backup Automation for RKE2](#backup-automation-for-rke2) |
| 26 | + |
| 27 | +## Section 1: Why You Should Not Rely on Local Snapshots |
| 28 | + |
| 29 | +By default, RKE2 stores its etcd snapshots under: |
| 30 | + |
| 31 | +/var/lib/rancher/rke2/server/db/snapshots |
| 32 | + |
| 33 | +While these snapshots are crucial for recovery, **storing them locally is risky**. For example, if you were to lose all etcd nodes at once, such as by accidentally deleting all three, any local snapshots would be lost along with them. |
| 34 | + |
| 35 | +Remote syncing, whether to S3 or a central NFS share, ensures your snapshots are available even if an etcd node is lost. For simplicity and offline clusters, NFS is a solid choice. |
| 36 | + |
| 37 | +So instead of relying on local snapshots, we’ll set up a system that automatically syncs new etcd snapshots to an NFS share. This way, you can be sure your backups are safe and accessible, even if the worst happens. |
| 38 | + |
| 39 | +NOTE: This guide assumes you have a working NFS server with a exported share already set up. And are not using storage classes or dynamic provisioning and are using a manually create static NFS PV and PVC. |
| 40 | + |
| 41 | +## Section 2: Architecture Overview |
| 42 | + |
| 43 | +We’ll use: |
| 44 | +- A **PVC** backed by an NFS share |
| 45 | +- A **DaemonSet** that runs only on etcd nodes |
| 46 | +- A shell script that uses `inotifywait` to watch for new `.zip` snapshot files |
| 47 | +- `rsync` to copy new snapshots to the NFS mount |
| 48 | + |
| 49 | +## Section 3: Kubernetes Resources |
| 50 | + |
| 51 | +### PVC (NFS) |
| 52 | + |
| 53 | +```yaml |
| 54 | +apiVersion: v1 |
| 55 | +kind: PersistentVolumeClaim |
| 56 | +metadata: |
| 57 | + name: nfs-backup-pvc |
| 58 | +spec: |
| 59 | + accessModes: |
| 60 | + - ReadWriteMany |
| 61 | + capacity: |
| 62 | + storage: 10Gi |
| 63 | + volumeName: nfs-pv |
| 64 | +``` |
| 65 | +
|
| 66 | +### PersistentVolume (NFS) |
| 67 | +
|
| 68 | +```yaml |
| 69 | +apiVersion: v1 |
| 70 | +kind: PersistentVolume |
| 71 | +metadata: |
| 72 | + name: nfs-pv |
| 73 | +spec: |
| 74 | + capacity: |
| 75 | + storage: 10Gi |
| 76 | + accessModes: |
| 77 | + - ReadWriteMany |
| 78 | + persistentVolumeReclaimPolicy: Retain |
| 79 | + nfs: |
| 80 | + server: nfs-server-ip |
| 81 | + path: /etcd-snapshots |
| 82 | +``` |
| 83 | +
|
| 84 | +### ConfigMap with inotify-based Watch Script |
| 85 | +
|
| 86 | +```yaml |
| 87 | +apiVersion: v1 |
| 88 | +kind: ConfigMap |
| 89 | +metadata: |
| 90 | + name: sync-etcd-script |
| 91 | + namespace: kube-system |
| 92 | +data: |
| 93 | + sync.sh: | |
| 94 | + #!/bin/sh |
| 95 | + echo "Waiting for inotify-tools..." |
| 96 | +
|
| 97 | + if ! command -v inotifywait >/dev/null 2>&1; then |
| 98 | + echo "Installing inotify-tools..." |
| 99 | + apk add --no-cache inotify-tools |
| 100 | + fi |
| 101 | +
|
| 102 | + SNAP_DIR="/var/lib/rancher/rke2/server/db/snapshots" |
| 103 | + DEST_DIR="/nfs" |
| 104 | +
|
| 105 | + echo "Watching $SNAP_DIR for new snapshots..." |
| 106 | + inotifywait -m -e close_write --format '%f' "$SNAP_DIR" | while read NEWFILE |
| 107 | + do |
| 108 | + if echo "$NEWFILE" | grep -q 'etcd-snapshot-'; then |
| 109 | + echo "Detected new snapshot: $NEWFILE" |
| 110 | + sleep 10 # brief delay to ensure write is complete |
| 111 | + rsync -avz "$SNAP_DIR/$NEWFILE" "$DEST_DIR/" |
| 112 | + echo "Synced $NEWFILE to NFS." |
| 113 | + fi |
| 114 | + done |
| 115 | +``` |
| 116 | +
|
| 117 | +### DaemonSet to Watch Snapshots on etcd Nodes |
| 118 | +
|
| 119 | +```yaml |
| 120 | +apiVersion: apps/v1 |
| 121 | +kind: DaemonSet |
| 122 | +metadata: |
| 123 | + name: etcd-backup-sync |
| 124 | + namespace: kube-system |
| 125 | +spec: |
| 126 | + selector: |
| 127 | + matchLabels: |
| 128 | + app: etcd-backup-sync |
| 129 | + template: |
| 130 | + metadata: |
| 131 | + labels: |
| 132 | + app: etcd-backup-sync |
| 133 | + spec: |
| 134 | + nodeSelector: |
| 135 | + node-role.kubernetes.io/etcd: "true" |
| 136 | + containers: |
| 137 | + - name: syncer |
| 138 | + image: alpine:3.19 |
| 139 | + command: |
| 140 | + - /bin/sh |
| 141 | + - -c |
| 142 | + - | |
| 143 | + /sync.sh |
| 144 | + volumeMounts: |
| 145 | + - name: etcd-snapshots |
| 146 | + mountPath: /var/lib/rancher/rke2/server/db/snapshots |
| 147 | + - name: nfs-backup |
| 148 | + mountPath: /nfs |
| 149 | + - name: sync-script |
| 150 | + mountPath: /sync.sh |
| 151 | + subPath: sync.sh |
| 152 | + readOnly: true |
| 153 | + volumes: |
| 154 | + - name: etcd-snapshots |
| 155 | + hostPath: |
| 156 | + path: /var/lib/rancher/rke2/server/db/snapshots |
| 157 | + - name: nfs-backup |
| 158 | + persistentVolumeClaim: |
| 159 | + claimName: nfs-backup-pvc |
| 160 | + - name: sync-script |
| 161 | + configMap: |
| 162 | + name: sync-etcd-script |
| 163 | +``` |
| 164 | +
|
| 165 | +### Section 4: Validation |
| 166 | +
|
| 167 | +Create a manual etcd snapshot: |
| 168 | +
|
| 169 | +```bash |
| 170 | +rke2 etcd-snapshot save --name test-backup |
| 171 | +``` |
| 172 | + |
| 173 | +Watch logs: |
| 174 | + |
| 175 | +```bash |
| 176 | +kubectl -n kube-system logs -l app=etcd-backup-sync |
| 177 | +``` |
| 178 | + |
| 179 | +Check the NFS mount for the .zip file. |
0 commit comments