Skip to content

Commit 7cafdde

Browse files
committed
Add guides for RKE2 etcd snapshot synchronization to NFS and Longhorn replica recovery
- Created a draft post detailing the process to sync RKE2 etcd snapshots to an NFS share to enhance backup safety. - Developed a comprehensive guide on recovering Longhorn volume data from a single replica in RKE2, including step-by-step instructions and troubleshooting tips.
1 parent 83d2aea commit 7cafdde

File tree

2 files changed

+448
-0
lines changed

2 files changed

+448
-0
lines changed
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
---
2+
title: "Robust RKE2 Backups: Syncing etcd Snapshots to NFS for Safety"
3+
date: 2025-04-10T00:00:00-05:00
4+
draft: true
5+
tags: ["rke2", "etcd", "backup", "nfs", "kubernetes"]
6+
categories:
7+
- Kubernetes
8+
- RKE2
9+
- Backup
10+
author: "Matthew Mattox - mmattox@support.tools"
11+
description: "Use inotify-based detection to automatically sync RKE2 etcd snapshots to an NFS share using a Kubernetes DaemonSet."
12+
more_link: "yes"
13+
url: "/rke2-etcd-nfs-sync/"
14+
---
15+
16+
If you're not pushing your RKE2 etcd snapshots to a remote location like S3, you're effectively backing up to the same node you're trying to protect. This creates a critical failure point—if that etcd node dies, your snapshots are gone with it.
17+
18+
This guide shows you how to fix that by:
19+
- Mounting an NFS share as a PVC,
20+
- Watching the etcd snapshot directory for new files using `inotifywait`, and
21+
- Automatically syncing those snapshots off-node via `rsync`.
22+
23+
<!--more-->
24+
25+
# [Backup Automation for RKE2](#backup-automation-for-rke2)
26+
27+
## Section 1: Why You Should Not Rely on Local Snapshots
28+
29+
By default, RKE2 stores its etcd snapshots under:
30+
31+
/var/lib/rancher/rke2/server/db/snapshots
32+
33+
While these snapshots are crucial for recovery, **storing them locally is risky**. For example, if you were to lose all etcd nodes at once, such as by accidentally deleting all three, any local snapshots would be lost along with them.
34+
35+
Remote syncing, whether to S3 or a central NFS share, ensures your snapshots are available even if an etcd node is lost. For simplicity and offline clusters, NFS is a solid choice.
36+
37+
So instead of relying on local snapshots, we’ll set up a system that automatically syncs new etcd snapshots to an NFS share. This way, you can be sure your backups are safe and accessible, even if the worst happens.
38+
39+
NOTE: This guide assumes you have a working NFS server with a exported share already set up. And are not using storage classes or dynamic provisioning and are using a manually create static NFS PV and PVC.
40+
41+
## Section 2: Architecture Overview
42+
43+
We’ll use:
44+
- A **PVC** backed by an NFS share
45+
- A **DaemonSet** that runs only on etcd nodes
46+
- A shell script that uses `inotifywait` to watch for new `.zip` snapshot files
47+
- `rsync` to copy new snapshots to the NFS mount
48+
49+
## Section 3: Kubernetes Resources
50+
51+
### PVC (NFS)
52+
53+
```yaml
54+
apiVersion: v1
55+
kind: PersistentVolumeClaim
56+
metadata:
57+
name: nfs-backup-pvc
58+
spec:
59+
accessModes:
60+
- ReadWriteMany
61+
capacity:
62+
storage: 10Gi
63+
volumeName: nfs-pv
64+
```
65+
66+
### PersistentVolume (NFS)
67+
68+
```yaml
69+
apiVersion: v1
70+
kind: PersistentVolume
71+
metadata:
72+
name: nfs-pv
73+
spec:
74+
capacity:
75+
storage: 10Gi
76+
accessModes:
77+
- ReadWriteMany
78+
persistentVolumeReclaimPolicy: Retain
79+
nfs:
80+
server: nfs-server-ip
81+
path: /etcd-snapshots
82+
```
83+
84+
### ConfigMap with inotify-based Watch Script
85+
86+
```yaml
87+
apiVersion: v1
88+
kind: ConfigMap
89+
metadata:
90+
name: sync-etcd-script
91+
namespace: kube-system
92+
data:
93+
sync.sh: |
94+
#!/bin/sh
95+
echo "Waiting for inotify-tools..."
96+
97+
if ! command -v inotifywait >/dev/null 2>&1; then
98+
echo "Installing inotify-tools..."
99+
apk add --no-cache inotify-tools
100+
fi
101+
102+
SNAP_DIR="/var/lib/rancher/rke2/server/db/snapshots"
103+
DEST_DIR="/nfs"
104+
105+
echo "Watching $SNAP_DIR for new snapshots..."
106+
inotifywait -m -e close_write --format '%f' "$SNAP_DIR" | while read NEWFILE
107+
do
108+
if echo "$NEWFILE" | grep -q 'etcd-snapshot-'; then
109+
echo "Detected new snapshot: $NEWFILE"
110+
sleep 10 # brief delay to ensure write is complete
111+
rsync -avz "$SNAP_DIR/$NEWFILE" "$DEST_DIR/"
112+
echo "Synced $NEWFILE to NFS."
113+
fi
114+
done
115+
```
116+
117+
### DaemonSet to Watch Snapshots on etcd Nodes
118+
119+
```yaml
120+
apiVersion: apps/v1
121+
kind: DaemonSet
122+
metadata:
123+
name: etcd-backup-sync
124+
namespace: kube-system
125+
spec:
126+
selector:
127+
matchLabels:
128+
app: etcd-backup-sync
129+
template:
130+
metadata:
131+
labels:
132+
app: etcd-backup-sync
133+
spec:
134+
nodeSelector:
135+
node-role.kubernetes.io/etcd: "true"
136+
containers:
137+
- name: syncer
138+
image: alpine:3.19
139+
command:
140+
- /bin/sh
141+
- -c
142+
- |
143+
/sync.sh
144+
volumeMounts:
145+
- name: etcd-snapshots
146+
mountPath: /var/lib/rancher/rke2/server/db/snapshots
147+
- name: nfs-backup
148+
mountPath: /nfs
149+
- name: sync-script
150+
mountPath: /sync.sh
151+
subPath: sync.sh
152+
readOnly: true
153+
volumes:
154+
- name: etcd-snapshots
155+
hostPath:
156+
path: /var/lib/rancher/rke2/server/db/snapshots
157+
- name: nfs-backup
158+
persistentVolumeClaim:
159+
claimName: nfs-backup-pvc
160+
- name: sync-script
161+
configMap:
162+
name: sync-etcd-script
163+
```
164+
165+
### Section 4: Validation
166+
167+
Create a manual etcd snapshot:
168+
169+
```bash
170+
rke2 etcd-snapshot save --name test-backup
171+
```
172+
173+
Watch logs:
174+
175+
```bash
176+
kubectl -n kube-system logs -l app=etcd-backup-sync
177+
```
178+
179+
Check the NFS mount for the .zip file.

0 commit comments

Comments
 (0)