Integrate native KubeVirt backup API (backup.kubevirt.io/v1alpha1)#412
Integrate native KubeVirt backup API (backup.kubevirt.io/v1alpha1)#412aagarwal-apexanalytix wants to merge 3 commits intokubevirt:mainfrom
Conversation
Add support for the native KubeVirt backup API alongside the existing CSI snapshot path. When enabled via Velero Backup labels, the plugin uses VirtualMachineBackup CRs for CBT-based backup with QEMU guest agent quiescing instead of CSI volume snapshots. Key changes: Phase 1a - v2 migration: - Upgrade VMBackupItemAction to BackupItemAction v2 (async Progress/Cancel) - Upgrade VMRestorePlugin to RestoreItemAction v2 (AreAdditionalItemsReady) - VM restore now waits for PVCs to be bound before creating the VM Phase 1b - Push mode full backup: - New nativebackup package: config, detect, backup, scratch, progress, tracker, agent, volumes - CRD feature detection with cached discovery - Scratch PVC provisioning sized to VM disk capacity - VirtualMachineBackup CR lifecycle (create, progress, cancel, cleanup) - Graceful CSI fallback on any failure (stopped VM, missing CRD, errors) - CSI double-snapshot prevention for native-backed PVCs Phase 1c - Operational polish: - Idempotent Execute() for Velero retries (AlreadyExists handling) - Finalize phase guard (no async ops in Finalize) - QEMU guest agent detection with auto skipQuiesce - Backup metadata annotations (type, checkpoint, volumes) - Scratch PVC TTL + garbage collection Phase 2 - Incremental backup: - VirtualMachineBackupTracker lifecycle (create once, reuse) - Checkpoint health check (redefinition after VM restart) - Source resolution: VM for full, Tracker for incremental - forceFullEveryN periodic full backup support Phase 3 - Atomicity + cleanup: - VMDeleteItemAction: cleanup native artifacts on backup deletion - VMItemBlockAction: atomic backup of VM + related resources - RBAC ClusterRole for backup.kubevirt.io permissions Configuration via Velero Backup labels: velero.kubevirt.io/native-backup: "true" velero.kubevirt.io/incremental-backup: "true" velero.kubevirt.io/skip-quiesce: "true" velero.kubevirt.io/scratch-storage-class: "<class>" velero.kubevirt.io/native-backup-concurrency: "5" velero.kubevirt.io/force-full-every-n: "7" Or via ConfigMap (velero/kubevirt-velero-plugin-config) for defaults. Signed-off-by: Akhilesh Agarwal <aagarwal@apexanalytix.com>
- Annotations now set on VM struct (not unstructured item) so they survive ToUnstructured conversion and are included in the backup - Guard volumeInDVTemplates against nil DataVolume source to prevent panic on volumes without DataVolume (e.g. PVC, CloudInit) - Log cleanup errors instead of silently discarding them Signed-off-by: Akhilesh Agarwal <aagarwal@apexanalytix.com>
…check 1. Replace context.TODO() with apiContext() (30s timeout) across all Kubernetes API calls to prevent indefinite hangs. 2. Implement annotation-based incremental backup counter on tracker CR. getIncrementalCount/updateIncrementalCount read/write the annotation, making forceFullEveryN work correctly for any N. 3. ParseOperationID now returns an error for malformed input instead of silently returning partial results. All callers updated. 4. isGuestAgentConnected returns (bool, error) so ShouldSkipQuiesce can distinguish "agent not connected" from "API call failed" and log appropriately. Signed-off-by: Akhilesh Agarwal <aagarwal@apexanalytix.com>
|
Hi @aagarwal-apexanalytix. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @ShellyKa13 — friendly ping on this PR. Happy to address any feedback or questions about the design. This adds optional support for the native KubeVirt backup API ( Would appreciate a |
|
@aagarwal-apexanalytix please review the following integrations: |
weshayutin
left a comment
There was a problem hiding this comment.
please review the prior art
|
@weshayutin: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
weshayutin
left a comment
There was a problem hiding this comment.
This would need a design and disucssion and some delineation from the current OADP / Velero design https://github.com/openshift/oadp-operator/blob/oadp-dev/docs/design/kubevirt-datamover.md
|
@weshayutin: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
Adds optional support for the native KubeVirt backup API (
backup.kubevirt.io/v1alpha1) alongside the existing CSI snapshot path. When enabled via Velero Backup labels, the plugin usesVirtualMachineBackupCRs for CBT-based, QEMU-aware backup with incremental support viaVirtualMachineBackupTrackercheckpoints.Key capabilities:
Execute→Progress→Cancel)forceFullEveryNDeleteItemActionremovesVirtualMachineBackupCRs and scratch PVCs when Velero backup is deletedItemBlockActionensures VM + related resources are backed up togetherAll behavior is opt-in via labels; existing CSI-only workflows are unaffected.
Which issue(s) this PR fixes:
Fixes #411
Special notes for your reviewer:
The implementation spans three commits for reviewability:
feat: integrate native KubeVirt backup API— Core implementation: newpkg/util/nativebackup/package (8 files), v2 upgrades to VM backup/restore actions, newDeleteItemActionandItemBlockAction, RBAC, unit testsfix: annotation loss, nil panic, and silent cleanup errors— Bug fixes found during review: annotations set on VM struct instead of unstructured item, nil guard involumeInDVTemplates, cleanup error loggingimprove: API timeouts, incremental counter, error propagation, agent check— Hardening: 30s context timeout on all K8s API calls, annotation-based incremental counter (replaces stub),ParseOperationIDreturns error, guest agent check returns error for caller differentiationConfiguration is via labels on the Velero Backup object:
velero.kubevirt.io/native-backupvelero.kubevirt.io/incremental-backupvelero.kubevirt.io/skip-quiescevelero.kubevirt.io/scratch-storage-classvelero.kubevirt.io/force-full-every-nvelero.kubevirt.io/native-backup-concurrencyCluster-wide defaults via ConfigMap
kubevirt-velero-plugin-configin theveleronamespace.Verified against KubeVirt v1.8.1
VirtualMachineBackupandVirtualMachineBackupTrackerCRD schemas from a live cluster.Release note: