Skip to content

Commit 1c1241b

Browse files
committed
OADP-6141: Use of kopia from downstream fork
Use of downstream fork of the Kopia repository. Updated velero and kopia to use oadp-dev branch. Signed-off-by: Michal Pryc <[email protected]>
1 parent 0895240 commit 1c1241b

8 files changed

+845
-4
lines changed

docs/design/CLI-design.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# `oc backup` and `oc restore` CLI Plugin
2+
3+
## Overview
4+
5+
This document outlines the, [human-first](https://clig.dev/) design for two separate OpenShift CLI plugins: `oc backup` and `oc restore`.
6+
7+
This design resolves the ambiguity of handling admin vs. non-admin objects by moving away from implicit, permission-based "magic" and towards an explicit, intent-driven interface.
8+
9+
### Core Design Philosophy
10+
11+
1. **Simple by Default:** For the most common developer task, the command should be as simple as possible. It defaults to operating on the user's current namespace context.
12+
2. **Explicit Overrides:** While the default is simple, users can explicitly override the scope for multi-namespace or cluster-wide operations using clear, descriptive flags. Explicit flags always take precedence over the implicit context.
13+
3. **Safety First:** The plugin includes guardrails to prevent users from accidentally performing dangerous operations, such as backing up the Velero control plane namespace itself.
14+
4. **No Conflicts with `oc`:** The plugin's flags do not conflict with or create ambiguity around the standard global flags of the `oc` command.
15+
16+
---
17+
18+
## Plugin 1: `oc backup`
19+
20+
### Summary
21+
22+
Manages Velero backups using a smart, context-aware interface that aligns with the OADP (OpenShift API for Data Protection) operational model. The plugin **automatically detects the OADP installation namespace** and creates the correct backup resource (`Backup` vs. `NonAdminBackup`) based on where the command is executed.
23+
24+
### How the Backup Object Type is Determined
25+
26+
The plugin's core logic is both intelligent and predictable, using a combination of auto-discovery and user context.
27+
28+
1. **OADP Namespace Discovery (Automatic):**
29+
* On execution, the plugin first attempts to discover the namespace where the OADP operator is running. It does this by searching the cluster for a `Deployment` with a standard OADP label (e.g., `app.kubernetes.io/name=oadp-operator`).
30+
* This discovered namespace is considered the **Admin Namespace**.
31+
* If the plugin does not have option to access namespaces get/list then it's considered **Developer Mode**.
32+
33+
2. **Execution Mode Selection:**
34+
* **Administrator Mode (`Backup` object):** If the user's current project (`oc project`) is the same as the **Discovered Admin Namespace**, the plugin operates in "Admin Mode" and creates standard `velero.io/Backup` objects.
35+
* **Developer Mode (Default - `NonAdminBackup` object):** If the command is executed in **any other namespace**, the plugin operates in "Developer Mode" and creates an `NonAdminBackup` object.
36+
37+
### Verbs
38+
39+
| Verb | Description |
40+
| ---------- | -------------------------------------------- |
41+
| `create` | Create a new backup. |
42+
| `get` | List all existing backups (both scopes). |
43+
| `describe` | Show detailed information about a backup. |
44+
| `delete` | Delete a backup and its data from storage. |
45+
| `logs` | Show the logs of a specific backup operation. |
46+
47+
### Key Command: `oc backup create <backup-name>`
48+
49+
This command creates a new backup(s). Its behavior changes based on the context described above.
50+
51+
| Flag | Valid Context(s) | Description |
52+
| ---------------------- | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
53+
| `--include-namespaces` | Admin & Developer | Specifies which namespace(s) to include in the backup. For Developers, this defaults to the current namespace if not provided. |
54+
| `--all-namespaces` | **Admin Only** | Performs a full, cluster-scoped backup. This flag will fail with an error if used outside of the dynamically discovered Admin Namespace. |
55+
| `--ttl` | Admin & Developer | Sets the time-to-live for the backup artifact (e.g., `24h`, `14d`). |
56+
57+
### Example Workflows
58+
59+
*This design requires no changes to a user's typical workflow. The complexity of choosing the correct backup object is handled automatically by the plugin.*
60+
61+
#### Developer Workflow: Backing Up the Current Project
62+
63+
A developer wants to back up their application, which lives in its own project. This is the simplest and most common use case.
64+
65+
```bash
66+
# 1. The developer works within their application's project.
67+
$ oc project my-shopping-app
68+
69+
# 2. They run the backup command without any special flags.
70+
# The plugin detects it is NOT in the Admin Namespace.
71+
$ oc backup create my-app-snapshot-v3
72+
73+
# Result: A `NonAdminBackup` object is created in the `my-shopping-app` namespace,
74+
# as this is a non-admin context.
75+
```
76+
77+
#### Administrator Workflow: Full Cluster Backup
78+
79+
An administrator needs to perform a complete, cluster-wide backup for disaster recovery.
80+
81+
```bash
82+
# 1. The admin switches their context to the OADP installation namespace.
83+
# The plugin detects this is the Admin Namespace.
84+
$ oc project oadp-operator
85+
86+
# 2. They create a backup using the --all-namespaces flag.
87+
$ oc backup create cluster-dr-backup-weekly --all-namespaces --ttl=14d
88+
89+
# Result: A standard `Velero Backup` object is created directly in the `oadp-operator` namespace,
90+
# triggering a privileged, cluster-wide backup.
91+
```
92+
93+
#### Administrator Workflow: Specific Namespace Backup
94+
95+
An administrator needs to create a privileged backup of specific user projects, not the entire cluster.
96+
97+
```bash
98+
# 1. The admin acts from within the OADP installation namespace by using the global '-n' flag.
99+
# The plugin detects this is the Admin Namespace and operates in "Admin Mode".
100+
$ oc backup -n oadp-operator create user-projects-backup --include-namespaces=ns1,ns2 --ttl=14d
101+
102+
---
103+
104+
## Plugin 2: `oc restore`
105+
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# Design: In-Guest File-Level Backup for the OpenShift KubeVirt VMs via OADP
2+
3+
## 1. Abstract
4+
5+
OADP provides robust, snapshot-based backup and restore for OpenShift KubeVirt Virtual Machines (VMs). This is ideal for full disaster recovery but does not address the common need for granular, file-level backups initiated from within the guest OS (e.g., backing up application configurations, user data, or log directories).
6+
7+
This design proposes a Kubernetes-native, client-server architecture to enable file-level backups directly from within VMs.
8+
9+
We will introduce a new Custom Resource Definition (CRD), `BackupStorageLocationServer`, which deploys and manages a Kopia repository server within the OpenShift cluster. This server will use an existing Velero `BackupStorageLocation` (BSL) as its storage backend.
10+
11+
Users inside VMs can then use a standard Kopia client to connect to this managed server and perform self-service, path-level backups and restores.
12+
13+
This approach shifts from an external, "pull" model (like `libguestfs` or `ssh`) to an internal, "push" model, empowering VM users and integrating seamlessly with existing cloud storage infrastructure and maintaining centralized storage governance by administrators.
14+
15+
16+
## 2. Background
17+
18+
Current VM protection with OADP focuses on block-level consistency by snapshotting Persistent Volume Claims (PVCs). While this method is effective for full VM recovery, it is inefficient and overly complex for scenarios where operators or application owners need to restore a single corrupted configuration file or recover a specific user directory.
19+
20+
This design introduces a persistent, long-running Kopia service that decouples the file-backup lifecycle from the VM snapshot lifecycle, offering greater flexibility, efficiency, and a better user experience for application owners and users within the VM.
21+
22+
## 3. Goals
23+
24+
- **Declarative Deployment:** Provide a Kubernetes-native Custom Resource Definition (CRD) (`BackupStorageLocationServer`) to automate the deployment and configuration of a multi-tenant Kopia repository server.
25+
- **Storage Re-use:** Leverage existing Velero `BackupStorageLocation` (BSL) resources to avoid credential duplication and simplify storage management.
26+
- **Self-Service for VM Users:** Enable users within a KubeVirt VM to use a standard Kopia client to back up and restore their own files on their own schedule.
27+
- **Centralized User Management:** Manage Kopia user credentials via OpenShift `Secrets` or `ConfigMaps`, allowing for GitOps-style management and easy user rotation.
28+
- **Secure by Default:** Ensure all communication between the in-guest client and the in-cluster server is secured with TLS.
29+
30+
31+
## 4. Non-Goals
32+
33+
- **Replacing Velero:** This is a *complementary* solution for file-level backup, not a replacement for Velero's full VM snapshot capabilities.
34+
- **Automatic Backup Client Installation:** The installation and configuration of the Kopia client *inside* the guest OS is the responsibility of the VM owner.
35+
- **External File Access:** This design does not use `libguestfs`, `ssh`, or any other mechanism to access the VM from the outside. All connections are initiated by the client inside the VM.
36+
- **Application Consistency:** It is the responsibility of the VM user to ensure that the application’s data is in a consistent state before starting a backup (e.g., by flushing caches or pausing writes).
37+
38+
## 5. High-Level Architecture
39+
40+
The architecture consists of a `KopiaServer` controller that manages the Kopia server `Deployment` and `Service`. VM clients connect to this service to perform backups.
41+
42+
```mermaid
43+
graph TD
44+
%% User action
45+
A[Admin] -->|kubectl apply| B(BackupStorageLocationServer CR)
46+
47+
%% Controller logic
48+
C[Controller] -->|Watches| B
49+
C -->|Reads| D[Velero BackupStorageLocation]
50+
C -->|Reads| E[Storage Secret]
51+
C -->|Reads| F[Kopia User Config]
52+
C -->|Manages| G[Kopia Server Deployment]
53+
C -->|Manages| H[Kubernetes Service]
54+
C -->|Updates| I[CR Status]
55+
56+
%% Kopia Server Pod
57+
G --> J[Kopia Server Process]
58+
J -->|Uses| K[Repo Password & Storage Credentials]
59+
J -->|Uses| L[User Config]
60+
61+
%% VM interaction
62+
M[KubeVirt VM Kopia Client] -->|Connects| H
63+
64+
%% Storage backend
65+
J -->|Reads/Writes| N[S3 Object Storage]
66+
D --> N
67+
68+
```
69+
70+
**Workflow:**
71+
1. An **Administrator** defines a `KopiaServer` CR, specifying the Velero BSL to use, a repository password secret, and a user management source.
72+
2. The **KopiaServer Controller** reconciles the CR. It reads the BSL and its associated credential secret to configure the storage backend.
73+
3. The controller creates a `Deployment` to run the Kopia server process and a `Service` (`ClusterIP` by default) to provide a stable endpoint (e.g., `my-kopia-server.backup-ns.svc.cluster.local`).
74+
4. The controller updates the `KopiaServer` CR's `status` field with the connectable service URL.
75+
5. A **VM User** installs the Kopia client in their VM. They obtain the server URL and their credentials (e.g., `user@hostname:password`) from the administrator.
76+
6. The Kopia client inside the VM connects to the Kopia server's `Service` endpoint, authenticates, and can then "push" backups of any desired path (e.g., `kopia snapshot create /etc/app/config`).
77+
78+
79+
```mermaid
80+
sequenceDiagram
81+
actor Cluster Admin
82+
participant k8s as OpenShift API
83+
participant op as OADP Operator
84+
participant ks as Kopia Server Pod
85+
participant s3 as S3 Bucket
86+
87+
Cluster Admin->>k8s: 1. Apply BackupStorageLocation CRD - **BSL**
88+
note over op: Operator is watching for CRDs
89+
90+
Cluster Admin->>k8s: 2. Apply BackupStorageLocationServer CRD - **BSLS**
91+
op->>k8s: 3. Read BSL & BSLS resources
92+
93+
op->>op: 4. Process reconciliation loop
94+
op->>k8s: 5. Read BSL config (for S3 secret and other config data)
95+
op->>k8s: 6. Create Pod(Kopia Server) resource
96+
97+
activate ks
98+
k8s->>ks: 7. Start Pod (Kopia Server)
99+
100+
ks->>s3: 8. Connect to repository
101+
s3-->>ks: Connection successful
102+
op->>op: 9. Process reconciliation loop (watching for Kopia Server pod)
103+
deactivate ks
104+
105+
op->>k8s: 10. Update BSLS status to 'Ready'
106+
k8s-->>Cluster Admin: BSLS Status is now 'Ready'
107+
```
108+
109+
## 6. Detailed Design
110+
111+
### 6.1. `KopiaServer` CRD Definition
112+
113+
**apiVersion:** `kopia.io/v1alpha1`
114+
**kind:** `KopiaServer`
115+
116+
#### `spec`
117+
118+
| Field | Type | Description | Required |
119+
| :--- | :--- | :--- | :--- |
120+
| **`storage`** | `StorageSpec` | Defines the backend storage by referencing a Velero BSL. | Yes |
121+
| **`repositoryPasswordSecretRef`** | `corev1.SecretKeySelector` | Reference to the Secret key containing the master password for the Kopia repository. | Yes |
122+
| **`userManagement`** | `UserManagementSpec` | Configuration for Kopia user credentials. | Yes |
123+
| **`service`** | `ServiceSpec` | Defines the Kubernetes Service used to expose the Kopia server. Defaults to `ClusterIP`. | No |
124+
| **`tls`**| `TLSSpec` | TLS configuration for the server endpoint. **Strongly Recommended.** | No |
125+
| **`image`** | `string` | Container image for the Kopia server. Defaults to `kopia/kopia:latest`. | No |
126+
| **`resources`** | `corev1.ResourceRequirements` | Kubernetes resource requests and limits for the Kopia server pod. | No |
127+
128+
#### `StorageSpec`
129+
130+
| Field | Type | Description | Required |
131+
| :--- | :--- | :--- | :--- |
132+
| **`velero`** | `VeleroStorageSpec` | Use an existing Velero `BackupStorageLocation` as the backend. | Yes |
133+
| **`prefix`** | `string` | An optional prefix within the bucket for Kopia data, e.g., `kopia-file-backups/`. | No |
134+
135+
#### `VeleroStorageSpec`
136+
137+
| Field | Type | Description | Required |
138+
| :--- | :--- | :--- | :--- |
139+
| **`backupStorageLocationName`**| `string` | Name of the `BackupStorageLocation` CR in the Velero namespace. | Yes |
140+
| **`veleroNamespace`** | `string` | The namespace where Velero is installed. Defaults to `velero`. | No |
141+
142+
#### `UserManagementSpec`
143+
144+
| Field | Type | Description | Required |
145+
| :--- | :--- | :--- | :--- |
146+
| **`source`** | `UserSource` | Source of the user list file (`username@hostname:passwordhash`). Only one can be set. | Yes |
147+
148+
| `UserSource` Field | Type | Description |
149+
| :--- | :--- | :--- |
150+
| **`secret`** | `corev1.SecretKeySelector` | A Secret key containing the user list. |
151+
| **`configMap`**| `corev1.ConfigMapKeySelector`| A ConfigMap key containing the user list. |
152+
153+
#### `TLSSpec`
154+
155+
| Field | Type | Description |
156+
| :--- | :--- | :--- |
157+
| **`secretName`** | `string` | Name of a `kubernetes.io/tls` type secret with `tls.crt` and `tls.key`. If not provided, the server runs without TLS (not recommended). |
158+
159+
#### `status`
160+
161+
| Field | Type | Description |
162+
| :--- | :--- | :--- |
163+
| **`conditions`** | `[]metav1.Condition` | Standard conditions like `Available`, `Progressing`. |
164+
| **`serviceURL`** | `string` | The internal DNS address and port for clients (e.g., `my-kopia.default.svc:51515`). |
165+
| **`repositoryStatus`** | `string` | The status of the Kopia repository (`Initialized`, `NotInitialized`, `Error`). |
166+
167+
### 6.2. Security Considerations
168+
169+
- **TLS Encryption:** Communication between the in-guest client and the server must be encrypted. The `tls.secretName` field is critical for production use.
170+
- **Controller RBAC:** The controller requires read access to `Secrets` in the Velero namespace and `BackupStorageLocations`. This access must be tightly scoped. The controller should create a derived, temporary secret for the Kopia pod to consume, rather than mounting Velero's credentials directly.
171+
- **Network Policies:** `NetworkPolicy` resources should be deployed to restrict access to the Kopia `Service`, allowing connections only from pods within namespaces that are designated to host VMs.
172+
- **User Credential Management:** Passwords in the user list file are hashed by Kopia, not stored in plaintext. This file should be managed via a `Secret` for better security.
173+
174+
## 7. Example Usage
175+
176+
#### 1. Create Prerequisite Secrets
177+
178+
```bash
179+
# 1. Secret for the repository master password
180+
kubectl create secret generic kopia-main-repo-pass \
181+
--from-literal=password='a-very-strong-and-secret-password-for-the-repo'
182+
183+
# 2. Secret for the user list (generate hashes with 'kopia user add' command)
184+
# userfile.txt should contain lines like:
185+
# web-vm-user@*:$2a$10$abcdefghijklmnopqrstuv.abcdefghijklmnopqrstuv.abcde
186+
kubectl create secret generic kopia-vm-users \
187+
--from-file=users.txt=userfile.txt
188+
189+
# 3. (Optional but Recommended) A TLS secret for the server
190+
kubectl create secret tls kopia-server-tls \
191+
--cert=path/to/tls.crt \
192+
--key=path/to/tls.key
193+
```
194+
195+
#### 2. Define the `KopiaServer` Resource
196+
197+
```yaml
198+
apiVersion: kopia.io/v1alpha1
199+
kind: KopiaServer
200+
metadata:
201+
name: main-kopia-server
202+
namespace: oadp-operator # Or another management namespace
203+
spec:
204+
# Use the 'default' BSL from the 'velero' namespace as the backend
205+
storage:
206+
velero:
207+
backupStorageLocationName: default
208+
veleroNamespace: velero
209+
prefix: guest-file-backups/
210+
211+
# Reference the secret for the repository password
212+
repositoryPasswordSecretRef:
213+
name: kopia-main-repo-pass
214+
key: password
215+
216+
# Reference the secret containing the Kopia user list
217+
userManagement:
218+
source:
219+
secret:
220+
name: kopia-vm-users
221+
key: users.txt
222+
223+
# Secure the server endpoint with a TLS certificate
224+
tls:
225+
secretName: kopia-server-tls
226+
227+
# Define resource limits for the server pod
228+
resources:
229+
requests:
230+
cpu: "200m"
231+
memory: "512Mi"
232+
limits:
233+
cpu: "1"
234+
memory: "2Gi"
235+
```
236+
237+
#### 3. In-VM Client Connection
238+
239+
A user inside a VM would then connect with:
240+
241+
```bash
242+
# First-time connection inside the VM
243+
# Server address is discovered from the KopiaServer status
244+
$ kopia repository connect server \
245+
--url https://main-kopia-server.oadp-operator.svc:51515 \
246+
--server-cert-fingerprint <FINGERPRINT_FROM_SERVER> \
247+
--username web-vm-user@my-vm-hostname \
248+
--password <provided-password>
249+
250+
# Subsequent backups
251+
$ kopia snapshot create /var/www/html --tags app:my-webapp
252+
```
253+
254+
255+
# Other considered designs
256+
257+
A previously considered approach using `libguestfs`, while technically feasible, tightly couples file-level operations to the cluster administrator's Velero backup schedule. It also introduces significant operational overhead for each backup operation, such as taking a snapshot, creating a PVC, mounting it, and launching a helper pod.
258+

0 commit comments

Comments
 (0)