Add checks to see rbd image existence along with store entry#840
Add checks to see rbd image existence along with store entry#840kasabe28 wants to merge 2 commits intoironcore-dev:mainfrom
Conversation
WalkthroughAdds RBD image existence checks, improves error messages with contextual fields, changes snapshot/image preparation to operate on opened librbd.Image handles, and adjusts snapshot reconciliation to guard and transition snapshot states based on image/snapshot presence. Changes
Sequence Diagram(s)sequenceDiagram
participant Reconciler
participant IOctx as RADOS IOContext
participant Librbd as librbd.Image
participant Store
Reconciler->>IOctx: isRbdImageExisting(imageID)
IOctx-->>Reconciler: exists? / error
alt image exists
Reconciler->>Librbd: Open(imageName) -> returns Image handle
Librbd-->>Reconciler: Image handle
Reconciler->>Librbd: prepareSnapshotContent(Image handle, rc)
Librbd-->>Reconciler: snapshot exists / protected?
alt snapshot missing or unprotected
Reconciler->>Librbd: populate image & create snapshot
Librbd-->>Reconciler: snapshot created & protected
end
Reconciler->>Store: update snapshot status (Ready/Failed)
else image missing
Reconciler->>Store: mark snapshot Failed (if Ready) to trigger recreation
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/controllers/common.go (1)
179-191: Code duplication with inconsistent parameter conventions.This new function duplicates logic from
(r *ImageReconciler) isImageExistinginimage_controller.go(lines 488-501). The key difference is in parameter expectations:
isRbdImageExisting: expects the RBD ID with prefix (callers passSnapshotIDToRBDID(...))r.isImageExisting: expects the image ID without prefix (appliesImageIDToRBDIDinternally)This inconsistency is error-prone and could lead to bugs if callers mix up conventions.
Consider consolidating into a single function with a clear naming convention. Since this function is shared across controllers, it could either:
- Always expect the full RBD ID (current
isRbdImageExistingbehavior), or- Always apply the prefix internally and rename appropriately.
♻️ Suggested consolidation
Option 1 - Keep
isRbdImageExistingand updateImageReconcilerto use it:-func (r *ImageReconciler) isImageExisting(ioCtx *rados.IOContext, imageID string) (bool, error) { - images, err := librbd.GetImageNames(ioCtx) - if err != nil { - return false, fmt.Errorf("failed to list images: %w", err) - } - - for _, img := range images { - if ImageIDToRBDID(imageID) == img { - return true, nil - } - } - - return false, nil -}Then update callers in
image_controller.goto useisRbdImageExisting(ioCtx, ImageIDToRBDID(imageID)).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controllers/common.go` around lines 179 - 191, The new isRbdImageExisting(ioCtx *rados.IOContext, imageID string) duplicates (r *ImageReconciler) isImageExisting and uses a different caller convention (expects full RBD ID); consolidate to a single canonical function and update call sites: keep isRbdImageExisting as the shared helper that expects the full RBD ID, remove or delegate r.isImageExisting to call isRbdImageExisting, and update ImageReconciler callers to pass ImageIDToRBDID(imageID) (or adjust other callers similarly) so all callers use the same RBD-ID-with-prefix convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controllers/image_controller.go`:
- Around line 753-769: The snapshot state check is inconsistent: you call
isRbdImageExisting for snapshots that may be SnapshotStatePopulated or
SnapshotStateReady but only mark missing RBD images as Failed when
snapshot.Status.State == providerapi.SnapshotStateReady; update the condition so
missing images are marked Failed for both providerapi.SnapshotStateReady and
providerapi.SnapshotStatePopulated (or tighten the earlier check to only call
this block for Ready) — modify the conditional around imageExists and
snapshot.Status.State in the block using image.Spec.Image, SnapshotIDToRBDID and
isRbdImageExisting, and ensure you still call r.snapshots.Update(ctx, snapshot)
when transitioning the state to providerapi.SnapshotStateFailed.
In `@internal/controllers/snapshot_controller.go`:
- Around line 414-424: In prepareSnapshotContent detect the "snapshot exists but
not protected" path when currentSnap.IsProtected() returns (false, nil) and
handle it by protecting the existing snapshot instead of proceeding to
populateImage/createSnapshot; specifically, after obtaining currentSnap from
rbdImg.GetSnapshot(ImageSnapshotVersion) and seeing isProtected == false, call
the appropriate Protect method (e.g., currentSnap.Protect() or the reconciler's
snapshot protect helper), check and wrap any errors (treat
already-protected/exists as OK), log the action, and then return nil so
reconciliation does not try to recreate the snapshot; keep error handling
consistent with the surrounding fmt.Errorf wrapping used elsewhere in
prepareSnapshotContent.
---
Nitpick comments:
In `@internal/controllers/common.go`:
- Around line 179-191: The new isRbdImageExisting(ioCtx *rados.IOContext,
imageID string) duplicates (r *ImageReconciler) isImageExisting and uses a
different caller convention (expects full RBD ID); consolidate to a single
canonical function and update call sites: keep isRbdImageExisting as the shared
helper that expects the full RBD ID, remove or delegate r.isImageExisting to
call isRbdImageExisting, and update ImageReconciler callers to pass
ImageIDToRBDID(imageID) (or adjust other callers similarly) so all callers use
the same RBD-ID-with-prefix convention.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
internal/controllers/common.gointernal/controllers/image_controller.gointernal/controllers/snapshot_controller.go
| func (r *SnapshotReconciler) prepareSnapshotContent(log logr.Logger, ioCtx *rados.IOContext, rbdImg *librbd.Image, rc io.ReadCloser) error { | ||
| rbdImageID := rbdImg.GetName() | ||
| currentSnap := rbdImg.GetSnapshot(ImageSnapshotVersion) | ||
| if isProtected, err := currentSnap.IsProtected(); err != nil { | ||
| if !errors.Is(err, librbd.ErrNotFound) { | ||
| return fmt.Errorf("failed to check if snapshot %s is protected: %w", ImageSnapshotVersion, err) | ||
| } | ||
| } else if isProtected { | ||
| log.V(2).Info("Snapshot already exists and is protected, skipping creation and protection.", "snapshotName", ImageSnapshotVersion) | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Bug: Unhandled case when snapshot exists but is not protected.
The current logic doesn't handle the scenario where a snapshot exists but is NOT protected (e.g., if a previous reconciliation crashed after CreateSnapshot but before Protect). In this case:
IsProtected()returns(false, nil)- Code proceeds to
populateImage(unnecessary, image already has data) createSnapshotfails because snapshot already exists
This would cause the reconciliation to fail repeatedly.
🐛 Proposed fix to handle the "exists but not protected" case
func (r *SnapshotReconciler) prepareSnapshotContent(log logr.Logger, ioCtx *rados.IOContext, rbdImg *librbd.Image, rc io.ReadCloser) error {
rbdImageID := rbdImg.GetName()
currentSnap := rbdImg.GetSnapshot(ImageSnapshotVersion)
if isProtected, err := currentSnap.IsProtected(); err != nil {
if !errors.Is(err, librbd.ErrNotFound) {
return fmt.Errorf("failed to check if snapshot %s is protected: %w", ImageSnapshotVersion, err)
}
+ // Snapshot doesn't exist, continue with population and creation
} else if isProtected {
log.V(2).Info("Snapshot already exists and is protected, skipping creation and protection.", "snapshotName", ImageSnapshotVersion)
return nil
+ } else {
+ // Snapshot exists but not protected - just protect it
+ log.V(2).Info("Snapshot exists but is not protected, protecting it", "snapshotName", ImageSnapshotVersion)
+ if err := currentSnap.Protect(); err != nil {
+ return fmt.Errorf("unable to protect existing snapshot %s: %w", ImageSnapshotVersion, err)
+ }
+ if err := rbdImg.SetSnapshot(ImageSnapshotVersion); err != nil {
+ return fmt.Errorf("failed to set snapshot %s for image %s: %w", ImageSnapshotVersion, rbdImageID, err)
+ }
+ return nil
}
if err := r.populateImage(log, rbdImg, rc); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controllers/snapshot_controller.go` around lines 414 - 424, In
prepareSnapshotContent detect the "snapshot exists but not protected" path when
currentSnap.IsProtected() returns (false, nil) and handle it by protecting the
existing snapshot instead of proceeding to populateImage/createSnapshot;
specifically, after obtaining currentSnap from
rbdImg.GetSnapshot(ImageSnapshotVersion) and seeing isProtected == false, call
the appropriate Protect method (e.g., currentSnap.Protect() or the reconciler's
snapshot protect helper), check and wrap any errors (treat
already-protected/exists as OK), log the action, and then return nil so
reconciliation does not try to recreate the snapshot; keep error handling
consistent with the surrounding fmt.Errorf wrapping used elsewhere in
prepareSnapshotContent.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controllers/image_controller.go`:
- Around line 272-273: The cleanup flow currently ignores errors from
snapshotExists (called with log, ioCtx, ImageIDToRBDID(snapName), snapName) and
proceeds as if the snapshot check succeeded; change the logic to propagate the
error instead of skipping it — i.e., if snapshotExists returns a non-nil err,
return or surface that error from the surrounding function (or handle it
consistently with existing error handling in this function) so snapshot
creation/removal is not attempted on incomplete information; ensure the call
site uses the err value from snapshotExists and does not treat only the boolean
result as authoritative.
- Around line 749-765: The check that gates parent RBD image existence must be
driven by the snapshot source, not by image.Spec.Image; replace the conditional
that uses image.Spec.Image with logic that calls
getSnapshotSourceDetails(snapshot) to obtain parentName and then call
isRbdImageExisting(ioCtx, SnapshotIDToRBDID(parentName)) (or equivalent RBD id
conversion) to determine existence; keep the same error handling and the
snapshot.Status.State update and r.snapshots.Update(...) flow (including
store.IgnoreErrNotFound(err)) when the parent is missing and
snapshot.Status.State == providerapi.SnapshotStateReady so the reconcile will
mark it failed and trigger creation.
In `@internal/controllers/snapshot_controller.go`:
- Around line 420-422: The current code ignores errors from snapshotExists,
which can lead to dangerous reconciliation when the check fails; update the
conditional around snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion)
to explicitly handle a non-nil err (e.g., log the error and return it or wrap
and return) instead of falling through — keep the early-return when
isSnapshotExist is true, but if err != nil return that error (or a wrapped
error) so reconciliation stops on transient failures.
- Around line 376-378: The existence check is using img.GetID() (store ID)
instead of the real RBD image name and silently ignores errors; change the call
to snapshotExists to pass the actual RBD image name (the "img_<id>" name used
for RBD operations) instead of img.GetID(), and if snapshotExists returns an
error, return that error immediately (do not treat it as non-existence) so
createSnapshot is only attempted when existence check truly succeeds; update the
call site where snapshotExists(log, ioCtx, img.GetID(), snapshot.ID) is invoked
to use the RBD image name and to propagate any error from snapshotExists up
instead of ignoring it (refer to snapshotExists, img.GetID()/RBD image name,
snapshot.ID, and createSnapshot).
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
internal/controllers/common.gointernal/controllers/image_controller.gointernal/controllers/snapshot_controller.go
| if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(snapName), snapName); err == nil && isSnapshotExist { | ||
| continue |
There was a problem hiding this comment.
Propagate snapshotExists errors in cleanup flow.
At Line 272, errors are ignored and execution continues. If Ceph lookup fails, the code may attempt snapshot creation/removal decisions on incomplete information.
💡 Proposed fix
- if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(snapName), snapName); err == nil && isSnapshotExist {
- continue
- }
+ if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(snapName), snapName); err != nil {
+ return fmt.Errorf("failed to check snapshot %s existence on cloned image: %w", snapName, err)
+ } else if isSnapshotExist {
+ continue
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controllers/image_controller.go` around lines 272 - 273, The cleanup
flow currently ignores errors from snapshotExists (called with log, ioCtx,
ImageIDToRBDID(snapName), snapName) and proceeds as if the snapshot check
succeeded; change the logic to propagate the error instead of skipping it —
i.e., if snapshotExists returns a non-nil err, return or surface that error from
the surrounding function (or handle it consistently with existing error handling
in this function) so snapshot creation/removal is not attempted on incomplete
information; ensure the call site uses the err value from snapshotExists and
does not treat only the boolean result as authoritative.
| if image.Spec.Image != "" { | ||
| log.V(2).Info("Check if snapshot rbd image already exist") | ||
| imageExists, err := isRbdImageExisting(ioCtx, SnapshotIDToRBDID(snapshotRef)) | ||
| if err != nil { | ||
| return false, fmt.Errorf("failed to check snapshot rbd image existence: %w", err) | ||
| } | ||
| log.V(1).Info("Checked snapshot rbd image existence", "imageExists", imageExists) | ||
|
|
||
| if !imageExists && snapshot.Status.State == providerapi.SnapshotStateReady { | ||
| log.V(1).Info("Snapshot rbd image does not exist. Mark snapshot as failed to trigger rbd image creation in next reconciliation loop") | ||
| snapshot.Status.State = providerapi.SnapshotStateFailed | ||
| if _, err := r.snapshots.Update(ctx, snapshot); store.IgnoreErrNotFound(err) != nil { | ||
| return false, fmt.Errorf("failed to update snapshot to trigger rbd image creation: %w", err) | ||
| } | ||
| return false, nil | ||
| } | ||
| } |
There was a problem hiding this comment.
Drive parent-image existence checks from snapshot.Source, not image.Spec.Image.
At Line 749, the check only runs when image.Spec.Image != "". That skips validation for SnapshotRef-driven reconciliations where the snapshot source still requires a parent RBD image. Use getSnapshotSourceDetails(snapshot) and check existence of parentName directly.
💡 Proposed fix
- if image.Spec.Image != "" {
+ parentName, snapName, err := getSnapshotSourceDetails(snapshot)
+ if err != nil {
+ return false, fmt.Errorf("failed to get snapshot source details: %w", err)
+ }
+
+ if snapshot.Source.IronCoreImage != "" || snapshot.Source.VolumeImageID != "" {
log.V(2).Info("Check if snapshot rbd image already exist")
- imageExists, err := isRbdImageExisting(ioCtx, SnapshotIDToRBDID(snapshotRef))
+ imageExists, err := isRbdImageExisting(ioCtx, parentName)
if err != nil {
return false, fmt.Errorf("failed to check snapshot rbd image existence: %w", err)
}
log.V(1).Info("Checked snapshot rbd image existence", "imageExists", imageExists)
if !imageExists && snapshot.Status.State == providerapi.SnapshotStateReady {
log.V(1).Info("Snapshot rbd image does not exist. Mark snapshot as failed to trigger rbd image creation in next reconciliation loop")
snapshot.Status.State = providerapi.SnapshotStateFailed
if _, err := r.snapshots.Update(ctx, snapshot); store.IgnoreErrNotFound(err) != nil {
return false, fmt.Errorf("failed to update snapshot to trigger rbd image creation: %w", err)
}
return false, nil
}
}
-
- parentName, snapName, err := getSnapshotSourceDetails(snapshot)
- if err != nil {
- return false, fmt.Errorf("failed to get snapshot source details: %w", err)
- }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controllers/image_controller.go` around lines 749 - 765, The check
that gates parent RBD image existence must be driven by the snapshot source, not
by image.Spec.Image; replace the conditional that uses image.Spec.Image with
logic that calls getSnapshotSourceDetails(snapshot) to obtain parentName and
then call isRbdImageExisting(ioCtx, SnapshotIDToRBDID(parentName)) (or
equivalent RBD id conversion) to determine existence; keep the same error
handling and the snapshot.Status.State update and r.snapshots.Update(...) flow
(including store.IgnoreErrNotFound(err)) when the parent is missing and
snapshot.Status.State == providerapi.SnapshotStateReady so the reconcile will
mark it failed and trigger creation.
| if isSnapshotExist, err := snapshotExists(log, ioCtx, img.GetID(), snapshot.ID); err == nil && isSnapshotExist { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Use the RBD image name (img_<id>) and fail fast on existence-check errors.
At Line 376, snapshotExists gets img.GetID() (store ID), but this helper expects the real RBD image name. Combined with ignored errors, this can miss existing snapshots and fall through to createSnapshot, which then fails on already-existing snapshots.
💡 Proposed fix
- if isSnapshotExist, err := snapshotExists(log, ioCtx, img.GetID(), snapshot.ID); err == nil && isSnapshotExist {
- return nil
- }
+ if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(img.ID), snapshot.ID); err != nil {
+ return fmt.Errorf("failed to check volume snapshot existence: %w", err)
+ } else if isSnapshotExist {
+ return nil
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controllers/snapshot_controller.go` around lines 376 - 378, The
existence check is using img.GetID() (store ID) instead of the real RBD image
name and silently ignores errors; change the call to snapshotExists to pass the
actual RBD image name (the "img_<id>" name used for RBD operations) instead of
img.GetID(), and if snapshotExists returns an error, return that error
immediately (do not treat it as non-existence) so createSnapshot is only
attempted when existence check truly succeeds; update the call site where
snapshotExists(log, ioCtx, img.GetID(), snapshot.ID) is invoked to use the RBD
image name and to propagate any error from snapshotExists up instead of ignoring
it (refer to snapshotExists, img.GetID()/RBD image name, snapshot.ID, and
createSnapshot).
| if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err == nil && isSnapshotExist { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Do not continue when snapshotExists fails.
At Line 420, a non-nil error is ignored, and reconciliation proceeds to populate/create snapshot. If this check fails transiently, you can perform unnecessary writes and hit avoidable failures.
💡 Proposed fix
- if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err == nil && isSnapshotExist {
- return nil
- }
+ if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err != nil {
+ return fmt.Errorf("failed to check ironcore snapshot existence: %w", err)
+ } else if isSnapshotExist {
+ return nil
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err == nil && isSnapshotExist { | |
| return nil | |
| } | |
| if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err != nil { | |
| return fmt.Errorf("failed to check ironcore snapshot existence: %w", err) | |
| } else if isSnapshotExist { | |
| return nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controllers/snapshot_controller.go` around lines 420 - 422, The
current code ignores errors from snapshotExists, which can lead to dangerous
reconciliation when the check fails; update the conditional around
snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion) to explicitly
handle a non-nil err (e.g., log the error and return it or wrap and return)
instead of falling through — keep the early-return when isSnapshotExist is true,
but if err != nil return that error (or a wrapped error) so reconciliation stops
on transient failures.
Proposed Changes
FailedFixes #842
Summary by CodeRabbit
Bug Fixes
Refactor