|
| 1 | +# Design: In-Guest File-Level Backup for the OpenShift KubeVirt VMs via OADP |
| 2 | + |
| 3 | +## 1. Abstract |
| 4 | + |
| 5 | +OADP provides robust, snapshot-based backup and restore for OpenShift KubeVirt Virtual Machines (VMs). This is ideal for full disaster recovery but does not address the common need for granular, file-level backups initiated from within the guest OS (e.g., backing up application configurations, user data, or log directories). |
| 6 | + |
| 7 | +This design proposes a Kubernetes-native, client-server architecture to enable file-level backups directly from within VMs. |
| 8 | + |
| 9 | +We will introduce a new Custom Resource Definition (CRD), `BackupStorageLocationServer`, which deploys and manages a Kopia repository server within the OpenShift cluster. This server will use an existing Velero `BackupStorageLocation` (BSL) as its storage backend. |
| 10 | + |
| 11 | +Users inside VMs can then use a standard Kopia client to connect to this managed server and perform self-service, path-level backups and restores. |
| 12 | + |
| 13 | +This approach shifts from an external, "pull" model (like `libguestfs` or `ssh`) to an internal, "push" model, empowering VM users and integrating seamlessly with existing cloud storage infrastructure and maintaining centralized storage governance by administrators. |
| 14 | + |
| 15 | + |
| 16 | +## 2. Background |
| 17 | + |
| 18 | +Current VM protection with OADP focuses on block-level consistency by snapshotting Persistent Volume Claims (PVCs). While this method is effective for full VM recovery, it is inefficient and overly complex for scenarios where operators or application owners need to restore a single corrupted configuration file or recover a specific user directory. |
| 19 | + |
| 20 | +This design introduces a persistent, long-running Kopia service that decouples the file-backup lifecycle from the VM snapshot lifecycle, offering greater flexibility, efficiency, and a better user experience for application owners and users within the VM. |
| 21 | + |
| 22 | +## 3. Goals |
| 23 | + |
| 24 | +- **Declarative Deployment:** Provide a Kubernetes-native Custom Resource Definition (CRD) (`BackupStorageLocationServer`) to automate the deployment and configuration of a multi-tenant Kopia repository server. |
| 25 | +- **Storage Re-use:** Leverage existing Velero `BackupStorageLocation` (BSL) resources to avoid credential duplication and simplify storage management. |
| 26 | +- **Self-Service for VM Users:** Enable users within a KubeVirt VM to use a standard Kopia client to back up and restore their own files on their own schedule. |
| 27 | +- **Centralized User Management:** Manage Kopia user credentials via OpenShift `Secrets` or `ConfigMaps`, allowing for GitOps-style management and easy user rotation. |
| 28 | +- **Secure by Default:** Ensure all communication between the in-guest client and the in-cluster server is secured with TLS. |
| 29 | + |
| 30 | + |
| 31 | +## 4. Non-Goals |
| 32 | + |
| 33 | +- **Replacing Velero:** This is a *complementary* solution for file-level backup, not a replacement for Velero's full VM snapshot capabilities. |
| 34 | +- **Automatic Backup Client Installation:** The installation and configuration of the Kopia client *inside* the guest OS is the responsibility of the VM owner. |
| 35 | +- **External File Access:** This design does not use `libguestfs`, `ssh`, or any other mechanism to access the VM from the outside. All connections are initiated by the client inside the VM. |
| 36 | +- **Application Consistency:** It is the responsibility of the VM user to ensure that the application’s data is in a consistent state before starting a backup (e.g., by flushing caches or pausing writes). |
| 37 | + |
| 38 | +## 5. High-Level Architecture |
| 39 | + |
| 40 | +The architecture consists of a `KopiaServer` controller that manages the Kopia server `Deployment` and `Service`. VM clients connect to this service to perform backups. |
| 41 | + |
| 42 | +```mermaid |
| 43 | +graph TD |
| 44 | + %% User action |
| 45 | + A[Admin] -->|kubectl apply| B(BackupStorageLocationServer CR) |
| 46 | +
|
| 47 | + %% Controller logic |
| 48 | + C[Controller] -->|Watches| B |
| 49 | + C -->|Reads| D[Velero BackupStorageLocation] |
| 50 | + C -->|Reads| E[Storage Secret] |
| 51 | + C -->|Reads| F[Kopia User Config] |
| 52 | + C -->|Manages| G[Kopia Server Deployment] |
| 53 | + C -->|Manages| H[Kubernetes Service] |
| 54 | + C -->|Updates| I[CR Status] |
| 55 | +
|
| 56 | + %% Kopia Server Pod |
| 57 | + G --> J[Kopia Server Process] |
| 58 | + J -->|Uses| K[Repo Password & Storage Credentials] |
| 59 | + J -->|Uses| L[User Config] |
| 60 | +
|
| 61 | + %% VM interaction |
| 62 | + M[KubeVirt VM Kopia Client] -->|Connects| H |
| 63 | +
|
| 64 | + %% Storage backend |
| 65 | + J -->|Reads/Writes| N[S3 Object Storage] |
| 66 | + D --> N |
| 67 | +
|
| 68 | +``` |
| 69 | + |
| 70 | +**Workflow:** |
| 71 | +1. An **Administrator** defines a `KopiaServer` CR, specifying the Velero BSL to use, a repository password secret, and a user management source. |
| 72 | +2. The **KopiaServer Controller** reconciles the CR. It reads the BSL and its associated credential secret to configure the storage backend. |
| 73 | +3. The controller creates a `Deployment` to run the Kopia server process and a `Service` (`ClusterIP` by default) to provide a stable endpoint (e.g., `my-kopia-server.backup-ns.svc.cluster.local`). |
| 74 | +4. The controller updates the `KopiaServer` CR's `status` field with the connectable service URL. |
| 75 | +5. A **VM User** installs the Kopia client in their VM. They obtain the server URL and their credentials (e.g., `user@hostname:password`) from the administrator. |
| 76 | +6. The Kopia client inside the VM connects to the Kopia server's `Service` endpoint, authenticates, and can then "push" backups of any desired path (e.g., `kopia snapshot create /etc/app/config`). |
| 77 | + |
| 78 | + |
| 79 | +```mermaid |
| 80 | +sequenceDiagram |
| 81 | + actor Cluster Admin |
| 82 | + participant k8s as OpenShift API |
| 83 | + participant op as OADP Operator |
| 84 | + participant ks as Kopia Server Pod |
| 85 | + participant s3 as S3 Bucket |
| 86 | +
|
| 87 | + Cluster Admin->>k8s: 1. Apply BackupStorageLocation CRD - **BSL** |
| 88 | + note over op: Operator is watching for CRDs |
| 89 | +
|
| 90 | + Cluster Admin->>k8s: 2. Apply BackupStorageLocationServer CRD - **BSLS** |
| 91 | + op->>k8s: 3. Read BSL & BSLS resources |
| 92 | + |
| 93 | + op->>op: 4. Process reconciliation loop |
| 94 | + op->>k8s: 5. Read BSL config (for S3 secret and other config data) |
| 95 | + op->>k8s: 6. Create Pod(Kopia Server) resource |
| 96 | + |
| 97 | + activate ks |
| 98 | + k8s->>ks: 7. Start Pod (Kopia Server) |
| 99 | + |
| 100 | + ks->>s3: 8. Connect to repository |
| 101 | + s3-->>ks: Connection successful |
| 102 | + op->>op: 9. Process reconciliation loop (watching for Kopia Server pod) |
| 103 | + deactivate ks |
| 104 | + |
| 105 | + op->>k8s: 10. Update BSLS status to 'Ready' |
| 106 | + k8s-->>Cluster Admin: BSLS Status is now 'Ready' |
| 107 | +``` |
| 108 | + |
| 109 | +## 6. Detailed Design |
| 110 | + |
| 111 | +### 6.1. `KopiaServer` CRD Definition |
| 112 | + |
| 113 | +**apiVersion:** `kopia.io/v1alpha1` |
| 114 | +**kind:** `KopiaServer` |
| 115 | + |
| 116 | +#### `spec` |
| 117 | + |
| 118 | +| Field | Type | Description | Required | |
| 119 | +| :--- | :--- | :--- | :--- | |
| 120 | +| **`storage`** | `StorageSpec` | Defines the backend storage by referencing a Velero BSL. | Yes | |
| 121 | +| **`repositoryPasswordSecretRef`** | `corev1.SecretKeySelector` | Reference to the Secret key containing the master password for the Kopia repository. | Yes | |
| 122 | +| **`userManagement`** | `UserManagementSpec` | Configuration for Kopia user credentials. | Yes | |
| 123 | +| **`service`** | `ServiceSpec` | Defines the Kubernetes Service used to expose the Kopia server. Defaults to `ClusterIP`. | No | |
| 124 | +| **`tls`**| `TLSSpec` | TLS configuration for the server endpoint. **Strongly Recommended.** | No | |
| 125 | +| **`image`** | `string` | Container image for the Kopia server. Defaults to `kopia/kopia:latest`. | No | |
| 126 | +| **`resources`** | `corev1.ResourceRequirements` | Kubernetes resource requests and limits for the Kopia server pod. | No | |
| 127 | + |
| 128 | +#### `StorageSpec` |
| 129 | + |
| 130 | +| Field | Type | Description | Required | |
| 131 | +| :--- | :--- | :--- | :--- | |
| 132 | +| **`velero`** | `VeleroStorageSpec` | Use an existing Velero `BackupStorageLocation` as the backend. | Yes | |
| 133 | +| **`prefix`** | `string` | An optional prefix within the bucket for Kopia data, e.g., `kopia-file-backups/`. | No | |
| 134 | + |
| 135 | +#### `VeleroStorageSpec` |
| 136 | + |
| 137 | +| Field | Type | Description | Required | |
| 138 | +| :--- | :--- | :--- | :--- | |
| 139 | +| **`backupStorageLocationName`**| `string` | Name of the `BackupStorageLocation` CR in the Velero namespace. | Yes | |
| 140 | +| **`veleroNamespace`** | `string` | The namespace where Velero is installed. Defaults to `velero`. | No | |
| 141 | + |
| 142 | +#### `UserManagementSpec` |
| 143 | + |
| 144 | +| Field | Type | Description | Required | |
| 145 | +| :--- | :--- | :--- | :--- | |
| 146 | +| **`source`** | `UserSource` | Source of the user list file (`username@hostname:passwordhash`). Only one can be set. | Yes | |
| 147 | + |
| 148 | +| `UserSource` Field | Type | Description | |
| 149 | +| :--- | :--- | :--- | |
| 150 | +| **`secret`** | `corev1.SecretKeySelector` | A Secret key containing the user list. | |
| 151 | +| **`configMap`**| `corev1.ConfigMapKeySelector`| A ConfigMap key containing the user list. | |
| 152 | + |
| 153 | +#### `TLSSpec` |
| 154 | + |
| 155 | +| Field | Type | Description | |
| 156 | +| :--- | :--- | :--- | |
| 157 | +| **`secretName`** | `string` | Name of a `kubernetes.io/tls` type secret with `tls.crt` and `tls.key`. If not provided, the server runs without TLS (not recommended). | |
| 158 | + |
| 159 | +#### `status` |
| 160 | + |
| 161 | +| Field | Type | Description | |
| 162 | +| :--- | :--- | :--- | |
| 163 | +| **`conditions`** | `[]metav1.Condition` | Standard conditions like `Available`, `Progressing`. | |
| 164 | +| **`serviceURL`** | `string` | The internal DNS address and port for clients (e.g., `my-kopia.default.svc:51515`). | |
| 165 | +| **`repositoryStatus`** | `string` | The status of the Kopia repository (`Initialized`, `NotInitialized`, `Error`). | |
| 166 | + |
| 167 | +### 6.2. Security Considerations |
| 168 | + |
| 169 | +- **TLS Encryption:** Communication between the in-guest client and the server must be encrypted. The `tls.secretName` field is critical for production use. |
| 170 | +- **Controller RBAC:** The controller requires read access to `Secrets` in the Velero namespace and `BackupStorageLocations`. This access must be tightly scoped. The controller should create a derived, temporary secret for the Kopia pod to consume, rather than mounting Velero's credentials directly. |
| 171 | +- **Network Policies:** `NetworkPolicy` resources should be deployed to restrict access to the Kopia `Service`, allowing connections only from pods within namespaces that are designated to host VMs. |
| 172 | +- **User Credential Management:** Passwords in the user list file are hashed by Kopia, not stored in plaintext. This file should be managed via a `Secret` for better security. |
| 173 | + |
| 174 | +## 7. Example Usage |
| 175 | + |
| 176 | +#### 1. Create Prerequisite Secrets |
| 177 | + |
| 178 | +```bash |
| 179 | +# 1. Secret for the repository master password |
| 180 | +kubectl create secret generic kopia-main-repo-pass \ |
| 181 | + --from-literal=password='a-very-strong-and-secret-password-for-the-repo' |
| 182 | + |
| 183 | +# 2. Secret for the user list (generate hashes with 'kopia user add' command) |
| 184 | +# userfile.txt should contain lines like: |
| 185 | +# web-vm-user@*:$2a$10$abcdefghijklmnopqrstuv.abcdefghijklmnopqrstuv.abcde |
| 186 | +kubectl create secret generic kopia-vm-users \ |
| 187 | + --from-file=users.txt=userfile.txt |
| 188 | + |
| 189 | +# 3. (Optional but Recommended) A TLS secret for the server |
| 190 | +kubectl create secret tls kopia-server-tls \ |
| 191 | + --cert=path/to/tls.crt \ |
| 192 | + --key=path/to/tls.key |
| 193 | +``` |
| 194 | + |
| 195 | +#### 2. Define the `KopiaServer` Resource |
| 196 | + |
| 197 | +```yaml |
| 198 | +apiVersion: kopia.io/v1alpha1 |
| 199 | +kind: KopiaServer |
| 200 | +metadata: |
| 201 | + name: main-kopia-server |
| 202 | + namespace: oadp-operator # Or another management namespace |
| 203 | +spec: |
| 204 | + # Use the 'default' BSL from the 'velero' namespace as the backend |
| 205 | + storage: |
| 206 | + velero: |
| 207 | + backupStorageLocationName: default |
| 208 | + veleroNamespace: velero |
| 209 | + prefix: guest-file-backups/ |
| 210 | + |
| 211 | + # Reference the secret for the repository password |
| 212 | + repositoryPasswordSecretRef: |
| 213 | + name: kopia-main-repo-pass |
| 214 | + key: password |
| 215 | + |
| 216 | + # Reference the secret containing the Kopia user list |
| 217 | + userManagement: |
| 218 | + source: |
| 219 | + secret: |
| 220 | + name: kopia-vm-users |
| 221 | + key: users.txt |
| 222 | + |
| 223 | + # Secure the server endpoint with a TLS certificate |
| 224 | + tls: |
| 225 | + secretName: kopia-server-tls |
| 226 | + |
| 227 | + # Define resource limits for the server pod |
| 228 | + resources: |
| 229 | + requests: |
| 230 | + cpu: "200m" |
| 231 | + memory: "512Mi" |
| 232 | + limits: |
| 233 | + cpu: "1" |
| 234 | + memory: "2Gi" |
| 235 | +``` |
| 236 | +
|
| 237 | +#### 3. In-VM Client Connection |
| 238 | +
|
| 239 | +A user inside a VM would then connect with: |
| 240 | +
|
| 241 | +```bash |
| 242 | +# First-time connection inside the VM |
| 243 | +# Server address is discovered from the KopiaServer status |
| 244 | +$ kopia repository connect server \ |
| 245 | + --url https://main-kopia-server.oadp-operator.svc:51515 \ |
| 246 | + --server-cert-fingerprint <FINGERPRINT_FROM_SERVER> \ |
| 247 | + --username web-vm-user@my-vm-hostname \ |
| 248 | + --password <provided-password> |
| 249 | + |
| 250 | +# Subsequent backups |
| 251 | +$ kopia snapshot create /var/www/html --tags app:my-webapp |
| 252 | +``` |
| 253 | + |
| 254 | + |
| 255 | +# Other considered designs |
| 256 | + |
| 257 | +A previously considered approach using `libguestfs`, while technically feasible, tightly couples file-level operations to the cluster administrator's Velero backup schedule. It also introduces significant operational overhead for each backup operation, such as taking a snapshot, creating a PVC, mounting it, and launching a helper pod. |
| 258 | + |
0 commit comments