MTV-3353 | Remove populator finalizers from PVCs during archive#4618
MTV-3353 | Remove populator finalizers from PVCs during archive#4618yaacov wants to merge 1 commit intokubev2v:mainfrom
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4618 +/- ##
==========================================
- Coverage 15.45% 10.24% -5.21%
==========================================
Files 112 469 +357
Lines 23377 53977 +30600
==========================================
+ Hits 3613 5531 +1918
- Misses 19479 47959 +28480
- Partials 285 487 +202
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9f1dba4 to
20fd9af
Compare
e7659dc to
52c87f3
Compare
69fbd6f to
c414a47
Compare
|
@coderabbitai please review, please check functionality was not lost, and new cleanup is safe |
🧩 Analysis chain
🏁 Script executed: Length of output: 405 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 36 🏁 Script executed: Length of output: 73 🏁 Script executed: Length of output: 16873 🏁 Script executed: Length of output: 8500 🏁 Script executed: Length of output: 2939 🏁 Script executed: Length of output: 1377 🏁 Script executed: Length of output: 1011 🏁 Script executed: Length of output: 830 🏁 Script executed: Length of output: 2216 🏁 Script executed: Length of output: 2782 🏁 Script executed: Length of output: 1492 🏁 Script executed: Length of output: 495 🏁 Script executed: Length of output: 1205 🏁 Script executed: Length of output: 125 🏁 Script executed: Length of output: 581 🏁 Script executed: Length of output: 586 🏁 Script executed: Length of output: 674 🏁 Script executed: Length of output: 1070 🏁 Script executed: Length of output: 1529 🏁 Script executed: Length of output: 2062 🏁 Script executed: Length of output: 811 🏁 Script executed: Length of output: 0 🏁 Script executed: Length of output: 1361 🏁 Script executed: Length of output: 459 🏁 Script executed: Length of output: 1290
✅ Functionality Preservation - Overall GoodThe new three-tier cleanup strategy is well-designed and safer than the old approach: Tier 1 (Temporary Resources) - ✅ Comprehensive
Tier 2 (Previous Migrations) - ✅ Excellent improvement
Tier 3 (Current Migration) - ✅ Logic preserved
|
📝 WalkthroughWalkthroughThis pull request centralizes label management across adapter builders by replacing hard-coded label maps with a shared Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/controller/plan/kubevirt.go (1)
1486-1499:⚠️ Potential issue | 🟠 MajorAvoid dropping all finalizers and treat NotFound as success.
After
Delete, the PVC can disappear immediately; the subsequent patch will return NotFound and currently bubbles up as an error. Also, clearing all finalizers can bypass unrelated protections (e.g., PVC protection). Prefer removing onlyPopulatorPVCFinalizerand ignore NotFound.🐛 Suggested fix
- // Remove all finalizers to allow the PVC to be deleted - pvcCopy := pvc.DeepCopy() - pvc.Finalizers = nil - patch := client.MergeFrom(pvcCopy) - if err = r.Destination.Client.Patch(context.TODO(), pvc, patch); err != nil { - return err - } + if err := r.removePopulatorFinalizerFromPVC(pvc); err != nil && !k8serr.IsNotFound(err) { + return err + }
🤖 Fix all issues with AI agents
In `@pkg/controller/plan/context/labeler.go`:
- Around line 38-47: VMLabelsWithExtra currently allows callers to override core
labels; fix it by filtering extraLabels before merging so keys "plan",
"migration", and "vmID" are ignored (or preserved) — in the VMLabelsWithExtra
method, build the base labels via r.VMLabels(vmRef) then iterate extraLabels but
skip any key equal to "plan", "migration", or "vmID" (or their constant names if
such constants exist) so core labels remain unchanged.
In `@pkg/controller/plan/kubevirt_test.go`:
- Around line 307-444: Add unit tests that mirror existing cleanup tests using
createKubeVirtWithPlanUIDAndObjects: for DeleteAllPlanConfigMaps and
DeleteAllPlanJobs create configmaps/jobs with matching and non-matching "plan"
labels and assert matching ones are deleted and others remain; for
DeleteAllPlanPopulatorCRs create dummy populator CRs labeled with the plan and
assert deletion; for RemoveAllPlanPVCFinalizers and
removePopulatorFinalizerFromPVC create PVCs with the plan label and with/without
the populator finalizer, call the function(s) and assert the finalizer is
removed while other PVCs are unchanged; for DeleteMigrationPods,
DeleteMigrationVMs, and DeleteMigrationDataVolumes create resources labeled with
"plan" and specific "migration" values, call each delete function with a target
migration UID and assert only resources with that migration UID are deleted. Use
the existing pattern of calling the method on kubevirt, checking error is nil,
listing the resource type from kubevirt.Destination.Client, and asserting
remaining items by name and finalizers.
- Around line 445-474: The two helper functions createKubeVirtWithPlanUID and
createKubeVirtWithPlanUIDAndObjects ignore their planUID parameter and hardcode
"test-plan-uid"; update them to assign the passed planUID to the Plan.UID field
(i.e., set kubevirt.Plan.UID = planUID in createKubeVirtWithPlanUID and plan.UID
= planUID in createKubeVirtWithPlanUIDAndObjects) so callers that pass different
IDs actually get those values; no other behavioral changes needed.
In `@pkg/controller/plan/kubevirt.go`:
- Around line 3766-3815: DeleteAllPlanPopulatorCRs currently swallows errors
from the r.Destination.Client.List calls for ovirtList, openstackList, and
vsphereList; change each List error path to log the error with context (using
r.Log.Error) and accumulate/wrap the error (include which populator list failed)
so the function returns a single aggregated error after attempting all deletions
(deletions remain best‑effort but list failures are surfaced). Locate the three
List calls (r.Destination.Client.List(..., ovirtList, opts),
r.Destination.Client.List(..., openstackList, opts),
r.Destination.Client.List(..., vsphereList, opts)), add error handling that
wraps errors with context (e.g., which list and namespace), append to a slice or
use errors.Join, log each failure, and return the joined/aggregated error at the
end of DeleteAllPlanPopulatorCRs.
In `@pkg/controller/plan/migration.go`:
- Around line 297-318: The Archive flow currently iterates
r.Plan.Status.Migration.History and calls
r.kubevirt.DeleteMigrationPods(migrationUID) but never removes CDI importer
pods; update Archive to also call the importer cleanup for each VM/migration by
invoking the existing importer cleanup routine (e.g., call
r.kubevirt.DeleteImporterPods or the local deleteImporterPods helper for the
same migration/VM context) inside the same loop or immediately after deleting
migration pods so CDI pods labeled {"app":"containerized-data-importer"} are
removed; ensure you reference the same migrationUID or VM identifier used in the
loop so you remove the correct importer pods and log errors similarly to the
other delete calls.
In `@pkg/controller/plan/util/openstack.go`:
- Around line 13-19: The createVolumePopulatorCR function (in
pkg/controller/plan/adapter/openstack/builder.go) is building
OpenstackVolumePopulator objects without the "plan" label because it lacks the
planId parameter; update createVolumePopulatorCR to accept a planId and add the
label into the ObjectMeta labels, or instead call the existing
util.OpenstackVolumePopulator(image, sourceUrl, transferNetwork,
targetNamespace, secretName, planId, vmId, migrationId) to construct the CR so
the "plan" label is always included; ensure callers of createVolumePopulatorCR
are updated to pass planId if you add the parameter.
c414a47 to
a36dd75
Compare
a1a1de7 to
8e80023
Compare
| func (r *KubeVirt) removePopulatorFinalizerFromPVC(pvc *core.PersistentVolumeClaim) error { | ||
| // Check if finalizer exists | ||
| hasFinalizer := false | ||
| for _, f := range pvc.Finalizers { | ||
| if f == PopulatorPVCFinalizer { | ||
| hasFinalizer = true | ||
| break | ||
| } | ||
| } | ||
| if !hasFinalizer { | ||
| return nil | ||
| } | ||
|
|
||
| // Remove the finalizer | ||
| pvcCopy := pvc.DeepCopy() | ||
| var newFinalizers []string | ||
| for _, f := range pvc.Finalizers { | ||
| if f != PopulatorPVCFinalizer { | ||
| newFinalizers = append(newFinalizers, f) | ||
| } | ||
| } | ||
| pvc.Finalizers = newFinalizers | ||
| patch := client.MergeFrom(pvcCopy) | ||
| if err := r.Destination.Client.Patch(context.TODO(), pvc, patch); err != nil { | ||
| r.Log.Error(err, "Failed to remove populator finalizer from PVC.", "pvc", pvc.Name) | ||
| return err | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
We should not remove the finalizers, this is big no.
This can be potentially dangerous as the storage migth not cleanup the volume.
The finalizer removal should be last desparate effort of dealing with PVC removal.
I would rather fail the removal and leave it up to the user than leaving the env in inconsitent state.
There was a problem hiding this comment.
leave it up to the user than leaving the env in inconsitent state.
Removing this propulator finalizers is heart of this change:
https://issues.redhat.com/browse/MTV-3353
The flow is:
a. populator create the PVC 🧑🏭
b. populator fail and abandon the PVC, or die 💀
c. pvc orphan 😢
I can add a verification that the populator is stuck or dead before removing the finalizes - e.g. check the populator that created them is in a state ( or dead ) that is will never delete the finalized, WDYT ?
This can be potentially dangerous as the storage might not cleanup the volume.
Note:
We only delete the populator finalzer, the k8s finilazer is not removed the k8s storage finalizer, this one will continue to block deletion until storage cleanup
The finalizer removal should be last desparate effort of dealing with PVC removal.
a. we know the populator will never remove it because of it's state
b. we are carful to not remove the k8s storage finalized to make sure we don't leave storage hanging
5122023 to
5f3b6d8
Compare
This PR enhances the plan archive mechanism with a two-tier cleanup strategy and ensures PVC populator finalizers are properly removed. Archive Cleanup Strategy: - Tier 1: Delete all temporary resources by plan label (pods, secrets, configmaps, jobs, populator CRs) and remove populator finalizers from all PVCs - Tier 2: Loop over migration history - apply VM status logic for current migration (preserve successful VMs, delete failed if DeleteVmOnFailMigration), delete ALL resources unconditionally for previous migrations Labeler Refactoring: - Add VMLabelsWithExtra() method to centralized Labeler for provider-specific labels while ensuring core labels (plan, migration, vmID) are always present - Refactor oVirt, OpenStack, and vSphere builders to use centralized Labeler instead of manually creating label maps - Add plan label to OvirtVolumePopulator and OpenstackVolumePopulator utils Tests: - Add unit tests for plan and migration cleanup functions - Fix vSphere builder tests to properly initialize Labeler in Context Ref: https://issues.redhat.com/browse/MTV-3353 Resolves: MTV-3353 Signed-off-by: yaacov <yzamir@redhat.com>
5f3b6d8 to
c3da1f0
Compare
|



Issue:
When a migration fails mid-copy (e.g., due to network disconnection), the target PVCs retain the populator finalizer even after archiving the plan. This prevents users from manually deleting the PVCs later.
Fix:
This PR enhances the plan archive mechanism with a two-tier cleanup strategy
and ensures PVC populator finalizers are properly removed.
Archive Cleanup Strategy:
configmaps, jobs, populator CRs) and remove populator finalizers from
all PVCs
migration (preserve successful VMs, delete failed if DeleteVmOnFailMigration),
delete ALL resources unconditionally for previous migrations
Examples:
Create and start migration with clean debug artifacts:
Start the vm, and cleanup the plan:
Create and start a migration.
plan failed mid transfer, now we can delete it.
Ref: https://issues.redhat.com/browse/MTV-3353
Resolves: MTV-3353