Skip to content

Comments

MTV-3353 | Remove populator finalizers from PVCs during archive#4618

Open
yaacov wants to merge 1 commit intokubev2v:mainfrom
yaacov:remove-populator-finilizers
Open

MTV-3353 | Remove populator finalizers from PVCs during archive#4618
yaacov wants to merge 1 commit intokubev2v:mainfrom
yaacov:remove-populator-finilizers

Conversation

@yaacov
Copy link
Member

@yaacov yaacov commented Feb 5, 2026

Issue:
When a migration fails mid-copy (e.g., due to network disconnection), the target PVCs retain the populator finalizer even after archiving the plan. This prevents users from manually deleting the PVCs later.

Fix:
This PR enhances the plan archive mechanism with a two-tier cleanup strategy
and ensures PVC populator finalizers are properly removed.

Archive Cleanup Strategy:

  • Tier 1: Delete all temporary resources by plan label (pods, secrets,
    configmaps, jobs, populator CRs) and remove populator finalizers from
    all PVCs
  • Tier 2: Loop over migration history - apply VM status logic for current
    migration (preserve successful VMs, delete failed if DeleteVmOnFailMigration),
    delete ALL resources unconditionally for previous migrations

Examples:

Create and start migration with clean debug artifacts:

❯ oc mtv create plan rhv -S rhv --vms mtv-rhel8-warm-2disks2nics --delete-vm-on-fail-migration
No target provider specified, using default OpenShift provider: ocp-target
No target namespace specified, using plan namespace: demo
plan/rhv created

❯ oc mtv start plan rhv
Migration started for plan 'rhv' in namespace 'demo'

Start the vm, and cleanup the plan:

❯ oc virt start mtv-rhel8-warm-2disks2nics
VM mtv-rhel8-warm-2disks2nics was scheduled to start

❯ oc get pods
NAME                                             READY   STATUS      RESTARTS   AGE
populate-775f7930-71ac-4bcf-83d3-2e75b51bedce    0/1     Completed   0          69s
populate-79f5cacf-479b-4a34-858b-5689e1bf69eb    0/1     Completed   0          68s
virt-launcher-mtv-rhel8-warm-2disks2nics-r7225   0/2     Init:0/1    0          6s

❯ oc mtv delete plan rhv
Plan 'rhv' archived
Waiting for plan 'rhv' to be archived...
Plan 'rhv' deleted from namespace 'demo'

❯ oc get pods
NAME                                             READY   STATUS    RESTARTS   AGE
virt-launcher-mtv-rhel8-warm-2disks2nics-r7225   2/2     Running   0          27s

❯ oc get pvc
NAME                                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                                 VOLUMEATTRIBUTESCLASS   AGE
15aa98bf-a49e-42e6-a600-9bb4f20c588c-dchfj   Bound    pvc-4b817411-e728-497f-92bd-c64d48fb150e   10Gi       RWX            ocs-storagecluster-ceph-rbd-virtualization   <unset>                 94s
6eaa8c55-dcc9-4dca-a3bf-773a1579e92b-p6cjl   Bound    pvc-72990c04-b80d-4788-95eb-0e3d73829ecb   15Gi       RWX            ocs-storagecluster-ceph-rbd-virtualization   <unset>                 93s

Create and start a migration.

❯ oc mtv create plan rhv -S rhv --vms mtv-rhel8-warm-2disks2nics
No target provider specified, using default OpenShift provider: ocp-target
No target namespace specified, using plan namespace: demo
plan/rhv created

❯ oc mtv start plan rhv
Migration started for plan 'rhv' in namespace 'demo'

plan failed mid transfer, now we can delete it.

❯ oc mtv delete plan rhv
Plan 'rhv' archived
Waiting for plan 'rhv' to be archived...
Plan 'rhv' deleted from namespace 'demo'

❯ oc get pvc
NAME                                         STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                                 VOLUMEATTRIBUTESCLASS   AGE
15aa98bf-a49e-42e6-a600-9bb4f20c588c-jtrxh   Pending                                                                        ocs-storagecluster-ceph-rbd-virtualization   <unset>                 37s
6eaa8c55-dcc9-4dca-a3bf-773a1579e92b-rmgsx   Pending                                                                        ocs-storagecluster-ceph-rbd-virtualization   <unset>                 37s
prime-9de488d7-d4ce-4c2b-a54c-e447cfde5dcc   Bound     pvc-7c074e99-3cf8-40cd-b03d-ca8e33238109   10Gi       RWX            ocs-storagecluster-ceph-rbd-virtualization   <unset>                 37s
prime-dfd7ccd0-5648-4d65-b8c9-3344f94d0829   Bound     pvc-b5028bbc-31e4-40a8-ae78-e35780ff95a3   15Gi       RWX            ocs-storagecluster-ceph-rbd-virtualization   <unset>                 37s

❯ oc delete pods --all
pod "populate-9de488d7-d4ce-4c2b-a54c-e447cfde5dcc" deleted
pod "populate-dfd7ccd0-5648-4d65-b8c9-3344f94d0829" deleted

❯ oc delete pvc --all
persistentvolumeclaim "15aa98bf-a49e-42e6-a600-9bb4f20c588c-jtrxh" deleted
persistentvolumeclaim "6eaa8c55-dcc9-4dca-a3bf-773a1579e92b-rmgsx" deleted
persistentvolumeclaim "prime-9de488d7-d4ce-4c2b-a54c-e447cfde5dcc" deleted
persistentvolumeclaim "prime-dfd7ccd0-5648-4d65-b8c9-3344f94d0829" deleted

❯ oc get pvc
No resources found in demo namespace.

Ref: https://issues.redhat.com/browse/MTV-3353
Resolves: MTV-3353

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 18.02817% with 291 lines in your changes missing coverage. Please review.
✅ Project coverage is 10.24%. Comparing base (f1fe5d0) to head (c3da1f0).
⚠️ Report is 1833 commits behind head on main.

Files with missing lines Patch % Lines
pkg/controller/plan/kubevirt.go 21.32% 205 Missing and 9 partials ⚠️
pkg/controller/plan/migration.go 0.00% 45 Missing ⚠️
pkg/controller/plan/adapter/ovirt/builder.go 0.00% 11 Missing ⚠️
pkg/controller/plan/adapter/openstack/builder.go 0.00% 7 Missing ⚠️
pkg/controller/plan/context/labeler.go 0.00% 5 Missing ⚠️
pkg/controller/plan/context/migration.go 0.00% 5 Missing ⚠️
pkg/controller/plan/util/openstack.go 0.00% 2 Missing ⚠️
pkg/controller/plan/util/ovirt.go 0.00% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4618      +/-   ##
==========================================
- Coverage   15.45%   10.24%   -5.21%     
==========================================
  Files         112      469     +357     
  Lines       23377    53977   +30600     
==========================================
+ Hits         3613     5531    +1918     
- Misses      19479    47959   +28480     
- Partials      285      487     +202     
Flag Coverage Δ
unittests 10.24% <18.02%> (-5.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yaacov yaacov force-pushed the remove-populator-finilizers branch 3 times, most recently from 9f1dba4 to 20fd9af Compare February 5, 2026 16:02
@yaacov yaacov marked this pull request as draft February 5, 2026 18:48
@yaacov yaacov force-pushed the remove-populator-finilizers branch 2 times, most recently from e7659dc to 52c87f3 Compare February 5, 2026 20:28
@yaacov yaacov marked this pull request as ready for review February 5, 2026 20:32
@yaacov yaacov force-pushed the remove-populator-finilizers branch 7 times, most recently from 69fbd6f to c414a47 Compare February 5, 2026 22:55
@yaacov
Copy link
Member Author

yaacov commented Feb 5, 2026

@coderabbitai please review, please check functionality was not lost, and new cleanup is safe

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

🧩 Analysis chain

@yaacov, I'll review the PR to verify functionality preservation and cleanup safety.


🏁 Script executed:

#!/bin/bash
# Fetch the PR details and file changes
gh pr view 4618 --json files,additions,deletions --repo kubev2v/forklift | jq -r '.files[] | .path' | head -20

Length of output: 405


🏁 Script executed:

#!/bin/bash
# Get the diff for the main files that likely contain the cleanup logic
gh pr diff 4618 --repo kubev2v/forklift -- pkg/controller/plan/kubevirt.go | head -400

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get the diff for migration.go which likely has cleanup coordination
gh pr diff 4618 --repo kubev2v/forklift -- pkg/controller/plan/migration.go | head -300

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Check for changes in the adapter builders that might affect cleanup
gh pr diff 4618 --repo kubev2v/forklift -- pkg/controller/plan/adapter/ovirt/builder.go pkg/controller/plan/adapter/openstack/builder.go pkg/controller/plan/adapter/vsphere/builder.go | head -200

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Check for changes in util files
gh pr diff 4618 --repo kubev2v/forklift -- pkg/controller/plan/util/ovirt.go pkg/controller/plan/util/openstack.go pkg/controller/plan/util/vsphere.go 2>/dev/null | head -150

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check context/labeler.go for label management changes
gh pr diff 4618 --repo kubev2v/forklift -- pkg/controller/plan/context/labeler.go 2>/dev/null | head -100

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Get the full diff for kubevirt.go
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/kubevirt.go

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get the full diff for migration.go
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/migration.go

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get diff for adapter/ovirt/builder.go
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/adapter/ovirt/builder.go

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get diff for adapter/openstack/builder.go
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/adapter/openstack/builder.go

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get diff for adapter/vsphere/builder.go
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/adapter/vsphere/builder.go

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get diff for context/labeler.go
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/context/labeler.go

Length of output: 36


🏁 Script executed:

#!/bin/bash
# Get diff for util files
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/util/ovirt.go
gh pr diff 4618 --repo kubev2v/forklift pkg/controller/plan/util/openstack.go

Length of output: 73


🏁 Script executed:

#!/bin/bash
# Clone the repo and get the diff
cd /tmp
git clone --depth 50 https://github.com/kubev2v/forklift.git
cd forklift
gh pr checkout 4618
git diff main...HEAD -- pkg/controller/plan/kubevirt.go | head -500

Length of output: 16873


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
git diff main...HEAD -- pkg/controller/plan/migration.go

Length of output: 8500


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
git diff main...HEAD -- pkg/controller/plan/adapter/ovirt/builder.go

Length of output: 2939


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
git diff main...HEAD -- pkg/controller/plan/adapter/openstack/builder.go

Length of output: 1377


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
git diff main...HEAD -- pkg/controller/plan/adapter/vsphere/builder.go

Length of output: 1011


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
git diff main...HEAD -- pkg/controller/plan/context/labeler.go

Length of output: 830


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
git diff main...HEAD -- pkg/controller/plan/util/ovirt.go pkg/controller/plan/util/openstack.go

Length of output: 2216


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check the base version to see what cleanup() method did
git diff main...HEAD -- pkg/controller/plan/migration.go | grep -A 20 -B 5 "cleanup"

Length of output: 2782


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Search for what the old cleanup method looked like on main branch
git show main:pkg/controller/plan/migration.go | grep -A 50 "^func.*cleanup"

Length of output: 1492


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check for 'resource' label usage in the codebase to understand the filtering
rg -n "kResource" --type go pkg/controller/plan/ | head -20

Length of output: 495


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check what labels are applied to secrets and configmaps
rg -n "Labels.*kResource" --type go -A 3 -B 3 pkg/controller/plan/ | head -40

Length of output: 1205


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check the definition of kResource constant
rg -n "^[[:space:]]*kResource" --type go pkg/controller/plan/

Length of output: 125


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check DeleteImporterPods implementation to see if it's label-based
git show HEAD:pkg/controller/plan/kubevirt.go | grep -A 30 "func.*DeleteImporterPods"

Length of output: 581


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check DeletePVCConsumerPod implementation
git show HEAD:pkg/controller/plan/kubevirt.go | grep -A 20 "func.*DeletePVCConsumerPod"

Length of output: 586


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check DeleteGuestConversionPod implementation
git show HEAD:pkg/controller/plan/kubevirt.go | grep -A 20 "func.*DeleteGuestConversionPod"

Length of output: 674


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check DeleteSecret and DeleteConfigMap implementations
git show HEAD:pkg/controller/plan/kubevirt.go | grep -A 20 "func.*DeleteSecret\|func.*DeleteConfigMap"

Length of output: 1070


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check how secrets and configmaps are created with kResource label
git show HEAD:pkg/controller/plan/kubevirt.go | grep -B 5 -A 15 "ResourceVMConfig\|ResourceVDDKConfig"

Length of output: 1529


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check if importer/consumer/conversion pods have migration labels
git show HEAD:pkg/controller/plan/kubevirt.go | grep -B 3 -A 10 "\"cdi.kubevirt.io/storage.import.importPodName\"\|consumerLabels\|guestConversionPodName"

Length of output: 2062


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check getImporterPods to see how importer pods are identified
git show HEAD:pkg/controller/plan/kubevirt.go | grep -A 30 "func.*getImporterPods"

Length of output: 811


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check how pods are created to understand labeling
git show HEAD:pkg/controller/plan/kubevirt.go | grep -B 10 -A 20 "func.*ensureGuestConversionPod\|func.*ensurePVCConsumerPod"

Length of output: 0


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Look for where DataVolumes are created (importer pods are created by CDI from DataVolumes)
git show HEAD:pkg/controller/plan/kubevirt.go | grep -B 5 -A 20 "DataVolume{" | head -60

Length of output: 1361


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check the old deleteImporterPods to see how it worked
git show main:pkg/controller/plan/migration.go | grep -A 20 "func.*deleteImporterPods"

Length of output: 459


🏁 Script executed:

#!/bin/bash
cd /tmp/forklift
# Check what DeleteMigrationPods does - does it catch all pod types?
git show HEAD:pkg/controller/plan/kubevirt.go | grep -B 3 -A 30 "^func.*DeleteMigrationPods"

Length of output: 1290


@yaacov, I've completed a comprehensive review of the cleanup logic changes. Here's my analysis:

Functionality Preservation - Overall Good

The new three-tier cleanup strategy is well-designed and safer than the old approach:

Tier 1 (Temporary Resources) - ✅ Comprehensive

  • Deletes pods, secrets, configmaps, jobs, and populator CRs by labels
  • Critically important: Removes populator finalizers from all PVCs (solves the core issue!)
  • Secrets/ConfigMaps correctly filtered by kResource label to preserve OCP-to-OCP dependencies

Tier 2 (Previous Migrations) - ✅ Excellent improvement

  • Unconditionally deletes ALL resources from previous migrations (VMs, DataVolumes, PVCs)
  • This is more thorough than the old code which didn't distinguish between migrations

Tier 3 (Current Migration) - ✅ Logic preserved

  • Correctly applies VM status logic
  • Preserves successful VMs, deletes failed VMs only if DeleteVmOnFailMigration enabled
  • Maintains removeLastWarmSnapshot() and migrator.Complete() calls

⚠️ Potential Issue: CDI Importer Pods

The old cleanup() method explicitly called deleteImporterPods(), which:

  • Found importer pods via PVC annotations and CDI app label
  • Importer pods are created by the CDI controller with name pattern importer-{pvc-name}
  • These pods may not have the migration label since they're created by CDI, not Forklift

The new DeleteMigrationPods() only deletes pods with the migration label. If CDI importer pods don't have this label, they won't be cleaned up during archive.

Recommendation: Verify whether CDI importer pods receive the migration label (they might via label propagation from PVCs/DataVolumes). If not, you may need to add explicit importer pod cleanup in Tier 1 or ensure DataVolume deletion cascades to importer pods.

Safety Verification

Labels are correctly applied:

  • Consumer/Conversion/Inspection pods: Have migration labels via vmLabels()
  • Populator pods: Have migration labels (per code comment) ✓
  • Secrets/ConfigMaps with kResource label: Migration temporary resources ✓
  • Secrets/ConfigMaps without kResource label: OCP-to-OCP dependencies, correctly preserved ✓
  • Jobs: Have plan labels, will be deleted ✓
  • Hook jobs: Have plan label + kResource: hook-config, will be deleted ✓

VSphere-specific cleanup replaced:

  • Old deleteValidateVddkJob() → Now handled by DeleteAllPlanJobs()
  • Old deleteConfigMap() for VDDK → Now handled by DeleteAllPlanConfigMaps() (has kResource: vddk-config label) ✓

Provider-specific cleanup preserved:

  • OVA/HyperV storage deletion still handled separately (cluster-scoped PVs) ✓

📋 Testing Recommendations

  1. Verify importer pod cleanup: Start a migration, let it create importer pods, then archive. Check if importer pods are deleted.
  2. Test warm migration: Ensure previous migration resources are fully cleaned up when retrying.
  3. Test OCP-to-OCP: Ensure dependency secrets/configmaps are preserved after archive.
  4. Test failed migration cleanup: Verify that with DeleteVmOnFailMigration enabled, all resources are removed.

Summary

The new cleanup strategy is a significant improvement in terms of safety and comprehensiveness. The three-tier approach properly separates concerns and the finalizer removal solves the core issue. The only concern is whether CDI importer pods are properly labeled for deletion—this should be verified during testing.


🧠 Learnings used
Learnt from: yaacov
Repo: kubev2v/forklift PR: 3204
File: operator/.upstream_manifests:156-181
Timestamp: 2025-10-16T14:11:57.112Z
Learning: Repo: kubev2v/forklift — Maintainer yaacov prefers not to create GitHub follow-up issues (e.g., for re-tightening ForkliftController CRD types after temporary relaxation); such work is tracked in a different system. In future reviews, acknowledge and proceed without proposing GitHub issue creation.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 4509
File: operator/.upstream_manifests:249-268
Timestamp: 2026-02-01T15:17:00.752Z
Learning: Repo: kubev2v/forklift — During reviews, do not request edits or PR-splitting based on diffs in auto-generated manifest files (e.g., operator/.upstream_manifests, operator/.kustomized_manifests, operator/.downstream_manifests). Treat them as outputs; only suggest scope changes if the underlying source templates or code introduced the unrelated changes.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 3204
File: operator/.upstream_manifests:6770-6772
Timestamp: 2025-10-16T18:19:40.853Z
Learning: Repo: kubev2v/forklift — For PRs relaxing ForkliftController CRD types (e.g., accepting booleans/ints/strings), do not modify or expand the example ForkliftController resource in operator/.upstream_manifests within that PR. Keep examples as-is and handle any exposure/consistency adjustments (e.g., listing feature_* flags in samples/CSV) in a separate, dedicated follow-up. Avoid creating GitHub issues for such follow-ups as they’re tracked elsewhere.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 3204
File: operator/.downstream_manifests:138-138
Timestamp: 2025-10-16T13:44:49.727Z
Learning: Repo: kubev2v/forklift — For PRs that relax ForkliftController CRD types (e.g., boolean-like and counting fields), maintainers prefer no field-name changes. Typos like controller_snapshot_removal_timeout_minuts should be tracked in a separate follow-up issue rather than addressed in the same PR.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 2561
File: Makefile:0-0
Timestamp: 2025-08-19T10:02:47.617Z
Learning: User yaacov prefers to defer security hardening improvements like checksum verification to follow-up PRs rather than expanding the scope of cleanup-focused PRs.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 2464
File: Makefile:138-147
Timestamp: 2025-08-13T03:31:49.934Z
Learning: In the kubev2v/forklift project, PRs should maintain focused scope and avoid mixing concerns. For example, adding a new environment variable should not be combined with larger refactoring efforts in the same PR, even if both changes affect the same file.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 2882
File: operator/config/rbac/role.yaml:42-47
Timestamp: 2025-09-18T13:45:04.028Z
Learning: In the yaacov/forklift project, comprehensive fixes (like RBAC permission cleanup) are preferred to be done in dedicated PRs rather than mixed with feature additions, to ensure consistency across the codebase.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 3204
File: operator/.upstream_manifests:156-181
Timestamp: 2025-10-16T13:46:22.273Z
Learning: Repo: kubev2v/forklift — When temporarily relaxing ForkliftController CRD types to accept booleans/ints alongside strings, maintainers prefer not to add runtime normalization or extra checks in roles/templates. Keep changes limited to schema, treat this as temporary compatibility, and plan to re-tighten types in a follow-up release.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 2540
File: pkg/controller/plan/adapter/vsphere/builder.go:694-699
Timestamp: 2025-08-18T13:45:08.364Z
Learning: The user yaacov prefers not to change existing behavior within the scope of a current PR when additional validation or safety checks are identified, and instead prefers to create follow-up issues to address such improvements separately.

Learnt from: yaacov
Repo: kubev2v/forklift PR: 2517
File: pkg/controller/plan/migration_cleanup_test.go:0-0
Timestamp: 2025-08-15T14:03:48.582Z
Learning: For the forklift project, when adding unit tests for complex methods like cleanup that have many dependencies, the team prefers to defer comprehensive refactoring to a separate issue if it's beyond the scope of the current feature being implemented. They favor simpler tests that focus on decision logic rather than requiring significant production code changes.

Learnt from: mrnold
Repo: kubev2v/forklift PR: 2807
File: pkg/controller/plan/adapter/vsphere/builder.go:0-0
Timestamp: 2025-09-19T03:47:22.164Z
Learning: In forklift vSphere builder, the pvcMap population (for warm migration with volume populators) and dvSource type determination (based on useV2vForTransfer) are independent conditions. This means pvcMap can be non-nil while dvSource.VDDK is nil, leading to potential nil dereference when accessing dvSource.VDDK.BackingFile. The backing file key should be computed from disk.File instead.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

This pull request centralizes label management across adapter builders by replacing hard-coded label maps with a shared VMLabelsWithExtra() method, introduces an extensive cleanup and archival infrastructure for plan and migration resources, and refactors the migration archive process to implement a three-tier cleanup strategy.

Changes

Cohort / File(s) Summary
Adapter Label Centralization
pkg/controller/plan/adapter/openstack/builder.go, pkg/controller/plan/adapter/ovirt/builder.go, pkg/controller/plan/adapter/vsphere/builder.go
Replaced inline label maps with centralized calls to r.Labeler.VMLabelsWithExtra() and r.Labeler.VMLabels() to unify label generation across volume populator and PVC/PV creation, removing direct vmID and migration label construction.
Labeler Enhancement
pkg/controller/plan/context/labeler.go
Added new VMLabelsWithExtra() method that merges standard VM labels with provider-specific extra labels.
Volume Populator Signatures
pkg/controller/plan/util/openstack.go, pkg/controller/plan/util/ovirt.go
Updated function signatures to accept planId parameter and added "plan" label to returned populator objects alongside existing vmID and migration labels.
Cleanup and Archival Infrastructure
pkg/controller/plan/kubevirt.go
Added new exported methods (DeleteAllPlanPods, DeleteAllPlanSecrets, DeleteAllPlanConfigMaps, DeleteAllPlanJobs, DeleteAllPlanPopulatorCRs, RemoveAllPlanPVCFinalizers, DeleteMigrationVMs, DeleteMigrationDataVolumes, DeleteMigrationPVCs, DeleteMigrationPods) and new exported constant PopulatorPVCFinalizer to support plan and migration-scoped resource cleanup.
Migration Archive Refactoring
pkg/controller/plan/migration.go
Implemented three-tier cleanup strategy: Tier 1 removes temporary resources and plan-scoped items, Tier 2 cleans up earlier migrations, Tier 3 handles active migration VM cleanup with conditional deletion logic based on DeleteVmOnFailMigration flag.
Test Infrastructure
pkg/controller/plan/kubevirt_test.go
Added comprehensive test suite validating cleanup helpers and plan-scoped resource deletion scenarios.
Builder Test Updates
pkg/controller/plan/adapter/vsphere/builder_test.go
Refactored test helper to explicitly initialize Labeler on context before returning Builder instance.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • #3283: Modifies warm-snapshot removal during migration cleanup in migration.go, directly related to the three-tier cleanup strategy.
  • #3366: Adds pod-deletion logic in kubevirt.go cleanup paths, overlapping with new DeleteAllPlanPods/DeleteMigrationPods methods.
  • #2517: Introduces DeleteVmOnFailMigration API field, which the main PR now implements within the three-tier cleanup logic for failed VM scenarios.

Suggested labels

storage-offload

Suggested reviewers

  • mrnold
  • mnecas
  • Hazanel
  • solenoci

Poem

🐰 Labels unified in one neat place,
Cleanup tiers bring order and grace,
Resources archived with finalizer care,
Plan and migration now organized fair!
The burrow of code grows cleaner each day,

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: removing populator finalizers from PVCs during archive, which is the core issue being fixed.
Description check ✅ Passed The pull request description clearly relates to the changeset, detailing the issue of retained populator finalizers on PVCs after plan archival and the implemented two-tier cleanup strategy with examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/controller/plan/kubevirt.go (1)

1486-1499: ⚠️ Potential issue | 🟠 Major

Avoid dropping all finalizers and treat NotFound as success.

After Delete, the PVC can disappear immediately; the subsequent patch will return NotFound and currently bubbles up as an error. Also, clearing all finalizers can bypass unrelated protections (e.g., PVC protection). Prefer removing only PopulatorPVCFinalizer and ignore NotFound.

🐛 Suggested fix
-	// Remove all finalizers to allow the PVC to be deleted
-	pvcCopy := pvc.DeepCopy()
-	pvc.Finalizers = nil
-	patch := client.MergeFrom(pvcCopy)
-	if err = r.Destination.Client.Patch(context.TODO(), pvc, patch); err != nil {
-		return err
-	}
+	if err := r.removePopulatorFinalizerFromPVC(pvc); err != nil && !k8serr.IsNotFound(err) {
+		return err
+	}
🤖 Fix all issues with AI agents
In `@pkg/controller/plan/context/labeler.go`:
- Around line 38-47: VMLabelsWithExtra currently allows callers to override core
labels; fix it by filtering extraLabels before merging so keys "plan",
"migration", and "vmID" are ignored (or preserved) — in the VMLabelsWithExtra
method, build the base labels via r.VMLabels(vmRef) then iterate extraLabels but
skip any key equal to "plan", "migration", or "vmID" (or their constant names if
such constants exist) so core labels remain unchanged.

In `@pkg/controller/plan/kubevirt_test.go`:
- Around line 307-444: Add unit tests that mirror existing cleanup tests using
createKubeVirtWithPlanUIDAndObjects: for DeleteAllPlanConfigMaps and
DeleteAllPlanJobs create configmaps/jobs with matching and non-matching "plan"
labels and assert matching ones are deleted and others remain; for
DeleteAllPlanPopulatorCRs create dummy populator CRs labeled with the plan and
assert deletion; for RemoveAllPlanPVCFinalizers and
removePopulatorFinalizerFromPVC create PVCs with the plan label and with/without
the populator finalizer, call the function(s) and assert the finalizer is
removed while other PVCs are unchanged; for DeleteMigrationPods,
DeleteMigrationVMs, and DeleteMigrationDataVolumes create resources labeled with
"plan" and specific "migration" values, call each delete function with a target
migration UID and assert only resources with that migration UID are deleted. Use
the existing pattern of calling the method on kubevirt, checking error is nil,
listing the resource type from kubevirt.Destination.Client, and asserting
remaining items by name and finalizers.
- Around line 445-474: The two helper functions createKubeVirtWithPlanUID and
createKubeVirtWithPlanUIDAndObjects ignore their planUID parameter and hardcode
"test-plan-uid"; update them to assign the passed planUID to the Plan.UID field
(i.e., set kubevirt.Plan.UID = planUID in createKubeVirtWithPlanUID and plan.UID
= planUID in createKubeVirtWithPlanUIDAndObjects) so callers that pass different
IDs actually get those values; no other behavioral changes needed.

In `@pkg/controller/plan/kubevirt.go`:
- Around line 3766-3815: DeleteAllPlanPopulatorCRs currently swallows errors
from the r.Destination.Client.List calls for ovirtList, openstackList, and
vsphereList; change each List error path to log the error with context (using
r.Log.Error) and accumulate/wrap the error (include which populator list failed)
so the function returns a single aggregated error after attempting all deletions
(deletions remain best‑effort but list failures are surfaced). Locate the three
List calls (r.Destination.Client.List(..., ovirtList, opts),
r.Destination.Client.List(..., openstackList, opts),
r.Destination.Client.List(..., vsphereList, opts)), add error handling that
wraps errors with context (e.g., which list and namespace), append to a slice or
use errors.Join, log each failure, and return the joined/aggregated error at the
end of DeleteAllPlanPopulatorCRs.

In `@pkg/controller/plan/migration.go`:
- Around line 297-318: The Archive flow currently iterates
r.Plan.Status.Migration.History and calls
r.kubevirt.DeleteMigrationPods(migrationUID) but never removes CDI importer
pods; update Archive to also call the importer cleanup for each VM/migration by
invoking the existing importer cleanup routine (e.g., call
r.kubevirt.DeleteImporterPods or the local deleteImporterPods helper for the
same migration/VM context) inside the same loop or immediately after deleting
migration pods so CDI pods labeled {"app":"containerized-data-importer"} are
removed; ensure you reference the same migrationUID or VM identifier used in the
loop so you remove the correct importer pods and log errors similarly to the
other delete calls.

In `@pkg/controller/plan/util/openstack.go`:
- Around line 13-19: The createVolumePopulatorCR function (in
pkg/controller/plan/adapter/openstack/builder.go) is building
OpenstackVolumePopulator objects without the "plan" label because it lacks the
planId parameter; update createVolumePopulatorCR to accept a planId and add the
label into the ObjectMeta labels, or instead call the existing
util.OpenstackVolumePopulator(image, sourceUrl, transferNetwork,
targetNamespace, secretName, planId, vmId, migrationId) to construct the CR so
the "plan" label is always included; ensure callers of createVolumePopulatorCR
are updated to pass planId if you add the parameter.

Comment on lines +1521 to +1549
func (r *KubeVirt) removePopulatorFinalizerFromPVC(pvc *core.PersistentVolumeClaim) error {
// Check if finalizer exists
hasFinalizer := false
for _, f := range pvc.Finalizers {
if f == PopulatorPVCFinalizer {
hasFinalizer = true
break
}
}
if !hasFinalizer {
return nil
}

// Remove the finalizer
pvcCopy := pvc.DeepCopy()
var newFinalizers []string
for _, f := range pvc.Finalizers {
if f != PopulatorPVCFinalizer {
newFinalizers = append(newFinalizers, f)
}
}
pvc.Finalizers = newFinalizers
patch := client.MergeFrom(pvcCopy)
if err := r.Destination.Client.Patch(context.TODO(), pvc, patch); err != nil {
r.Log.Error(err, "Failed to remove populator finalizer from PVC.", "pvc", pvc.Name)
return err
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not remove the finalizers, this is big no.
This can be potentially dangerous as the storage migth not cleanup the volume.
The finalizer removal should be last desparate effort of dealing with PVC removal.
I would rather fail the removal and leave it up to the user than leaving the env in inconsitent state.

Copy link
Member Author

@yaacov yaacov Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave it up to the user than leaving the env in inconsitent state.

Removing this propulator finalizers is heart of this change:
https://issues.redhat.com/browse/MTV-3353

The flow is:
a. populator create the PVC 🧑‍🏭
b. populator fail and abandon the PVC, or die 💀
c. pvc orphan 😢

I can add a verification that the populator is stuck or dead before removing the finalizes - e.g. check the populator that created them is in a state ( or dead ) that is will never delete the finalized, WDYT ?

This can be potentially dangerous as the storage might not cleanup the volume.

Note:
We only delete the populator finalzer, the k8s finilazer is not removed the k8s storage finalizer, this one will continue to block deletion until storage cleanup

The finalizer removal should be last desparate effort of dealing with PVC removal.

a. we know the populator will never remove it because of it's state
b. we are carful to not remove the k8s storage finalized to make sure we don't leave storage hanging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnecas ping ^^ ?

@yaacov yaacov force-pushed the remove-populator-finilizers branch 3 times, most recently from 5122023 to 5f3b6d8 Compare February 19, 2026 15:56
This PR enhances the plan archive mechanism with a two-tier cleanup strategy
and ensures PVC populator finalizers are properly removed.

Archive Cleanup Strategy:
- Tier 1: Delete all temporary resources by plan label (pods, secrets,
  configmaps, jobs, populator CRs) and remove populator finalizers from
  all PVCs
- Tier 2: Loop over migration history - apply VM status logic for current
  migration (preserve successful VMs, delete failed if DeleteVmOnFailMigration),
  delete ALL resources unconditionally for previous migrations

Labeler Refactoring:
- Add VMLabelsWithExtra() method to centralized Labeler for provider-specific
  labels while ensuring core labels (plan, migration, vmID) are always present
- Refactor oVirt, OpenStack, and vSphere builders to use centralized Labeler
  instead of manually creating label maps
- Add plan label to OvirtVolumePopulator and OpenstackVolumePopulator utils

Tests:
- Add unit tests for plan and migration cleanup functions
- Fix vSphere builder tests to properly initialize Labeler in Context

Ref: https://issues.redhat.com/browse/MTV-3353
Resolves: MTV-3353

Signed-off-by: yaacov <yzamir@redhat.com>
@yaacov yaacov force-pushed the remove-populator-finilizers branch from 5f3b6d8 to c3da1f0 Compare February 20, 2026 09:00
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants