Skip to content

Add checks to see rbd image existence along with store entry#840

Open
kasabe28 wants to merge 2 commits intoironcore-dev:mainfrom
opensovereigncloud:osc/fix/handle-manual-deletion-of-rbd-image
Open

Add checks to see rbd image existence along with store entry#840
kasabe28 wants to merge 2 commits intoironcore-dev:mainfrom
opensovereigncloud:osc/fix/handle-manual-deletion-of-rbd-image

Conversation

@kasabe28
Copy link
Contributor

@kasabe28 kasabe28 commented Mar 2, 2026

Proposed Changes

  • Before cloning image, check if rbd image of volume source exists. If rbd image doesn't exist, then mark the snapshot state as Failed
  • Check if rbd snapshot exists during snapshot recociliation. And before checking snapshot state, check for rbd image existence also.
  • Create rbd image for ironcore image only when it's not present.
  • Check rbd snapshot existence before populating image.
  • Improve logs.

Fixes #842

Summary by CodeRabbit

  • Bug Fixes

    • Stronger existence checks for images and snapshots to avoid duplicate work and failed operations
    • Automatic state transitions when referenced images/snapshots are missing, triggering safe retries
    • Improved error messages include contextual names for easier troubleshooting
    • Better handling of already-protected snapshots to prevent operation failures
  • Refactor

    • Snapshot preparation and image workflows now use direct image handles to ensure safer resource management

@kasabe28 kasabe28 self-assigned this Mar 2, 2026
@kasabe28 kasabe28 requested a review from a team as a code owner March 2, 2026 05:35
@github-actions github-actions bot added the size/L label Mar 2, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

Walkthrough

Adds RBD image existence checks, improves error messages with contextual fields, changes snapshot/image preparation to operate on opened librbd.Image handles, and adjusts snapshot reconciliation to guard and transition snapshot states based on image/snapshot presence.

Changes

Cohort / File(s) Summary
Common utilities & error context
internal/controllers/common.go
Added isRbdImageExisting(ioCtx, imageID) using librbd.GetImageNames and slices.Contains. Enhanced error messages (include imageName, pool, snapshotName). Changed createSnapshot signature to accept imageName.
Image controller: clone & pre-checks
internal/controllers/image_controller.go
Short-circuited snapshot existence check in deleteImageSnapshots. In createImageFromSnapshot, added pre-check that verifies parent RBD image exists; if missing while snapshot is Ready, mark snapshot Failed and update store to trigger recreation.
Snapshot controller: image-handle flow & guards
internal/controllers/snapshot_controller.go
Reconciler now verifies image existence before progressing SnapshotStatePopulated→Ready. reconcileIroncoreImageSnapshot opens or creates an image handle and defers close. prepareSnapshotContent signature changed to accept *librbd.Image; it now derives imageID from the handle, checks/creates/protects snapshots using the opened image, and skips work when snapshots already exist. Added guards to avoid recreating existing protected snapshots.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler
    participant IOctx as RADOS IOContext
    participant Librbd as librbd.Image
    participant Store

    Reconciler->>IOctx: isRbdImageExisting(imageID)
    IOctx-->>Reconciler: exists? / error
    alt image exists
        Reconciler->>Librbd: Open(imageName) -> returns Image handle
        Librbd-->>Reconciler: Image handle
        Reconciler->>Librbd: prepareSnapshotContent(Image handle, rc)
        Librbd-->>Reconciler: snapshot exists / protected?
        alt snapshot missing or unprotected
            Reconciler->>Librbd: populate image & create snapshot
            Librbd-->>Reconciler: snapshot created & protected
        end
        Reconciler->>Store: update snapshot status (Ready/Failed)
    else image missing
        Reconciler->>Store: mark snapshot Failed (if Ready) to trigger recreation
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding checks for RBD image existence alongside store entries.
Description check ✅ Passed The pull request description covers the main changes and aligns with the PR objectives, but lacks detail on the technical implementation and scope of changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/controllers/common.go (1)

179-191: Code duplication with inconsistent parameter conventions.

This new function duplicates logic from (r *ImageReconciler) isImageExisting in image_controller.go (lines 488-501). The key difference is in parameter expectations:

  • isRbdImageExisting: expects the RBD ID with prefix (callers pass SnapshotIDToRBDID(...))
  • r.isImageExisting: expects the image ID without prefix (applies ImageIDToRBDID internally)

This inconsistency is error-prone and could lead to bugs if callers mix up conventions.

Consider consolidating into a single function with a clear naming convention. Since this function is shared across controllers, it could either:

  1. Always expect the full RBD ID (current isRbdImageExisting behavior), or
  2. Always apply the prefix internally and rename appropriately.
♻️ Suggested consolidation

Option 1 - Keep isRbdImageExisting and update ImageReconciler to use it:

-func (r *ImageReconciler) isImageExisting(ioCtx *rados.IOContext, imageID string) (bool, error) {
-	images, err := librbd.GetImageNames(ioCtx)
-	if err != nil {
-		return false, fmt.Errorf("failed to list images: %w", err)
-	}
-
-	for _, img := range images {
-		if ImageIDToRBDID(imageID) == img {
-			return true, nil
-		}
-	}
-
-	return false, nil
-}

Then update callers in image_controller.go to use isRbdImageExisting(ioCtx, ImageIDToRBDID(imageID)).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controllers/common.go` around lines 179 - 191, The new
isRbdImageExisting(ioCtx *rados.IOContext, imageID string) duplicates (r
*ImageReconciler) isImageExisting and uses a different caller convention
(expects full RBD ID); consolidate to a single canonical function and update
call sites: keep isRbdImageExisting as the shared helper that expects the full
RBD ID, remove or delegate r.isImageExisting to call isRbdImageExisting, and
update ImageReconciler callers to pass ImageIDToRBDID(imageID) (or adjust other
callers similarly) so all callers use the same RBD-ID-with-prefix convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controllers/image_controller.go`:
- Around line 753-769: The snapshot state check is inconsistent: you call
isRbdImageExisting for snapshots that may be SnapshotStatePopulated or
SnapshotStateReady but only mark missing RBD images as Failed when
snapshot.Status.State == providerapi.SnapshotStateReady; update the condition so
missing images are marked Failed for both providerapi.SnapshotStateReady and
providerapi.SnapshotStatePopulated (or tighten the earlier check to only call
this block for Ready) — modify the conditional around imageExists and
snapshot.Status.State in the block using image.Spec.Image, SnapshotIDToRBDID and
isRbdImageExisting, and ensure you still call r.snapshots.Update(ctx, snapshot)
when transitioning the state to providerapi.SnapshotStateFailed.

In `@internal/controllers/snapshot_controller.go`:
- Around line 414-424: In prepareSnapshotContent detect the "snapshot exists but
not protected" path when currentSnap.IsProtected() returns (false, nil) and
handle it by protecting the existing snapshot instead of proceeding to
populateImage/createSnapshot; specifically, after obtaining currentSnap from
rbdImg.GetSnapshot(ImageSnapshotVersion) and seeing isProtected == false, call
the appropriate Protect method (e.g., currentSnap.Protect() or the reconciler's
snapshot protect helper), check and wrap any errors (treat
already-protected/exists as OK), log the action, and then return nil so
reconciliation does not try to recreate the snapshot; keep error handling
consistent with the surrounding fmt.Errorf wrapping used elsewhere in
prepareSnapshotContent.

---

Nitpick comments:
In `@internal/controllers/common.go`:
- Around line 179-191: The new isRbdImageExisting(ioCtx *rados.IOContext,
imageID string) duplicates (r *ImageReconciler) isImageExisting and uses a
different caller convention (expects full RBD ID); consolidate to a single
canonical function and update call sites: keep isRbdImageExisting as the shared
helper that expects the full RBD ID, remove or delegate r.isImageExisting to
call isRbdImageExisting, and update ImageReconciler callers to pass
ImageIDToRBDID(imageID) (or adjust other callers similarly) so all callers use
the same RBD-ID-with-prefix convention.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 022122d and 91e6049.

📒 Files selected for processing (3)
  • internal/controllers/common.go
  • internal/controllers/image_controller.go
  • internal/controllers/snapshot_controller.go

Comment on lines +414 to 424
func (r *SnapshotReconciler) prepareSnapshotContent(log logr.Logger, ioCtx *rados.IOContext, rbdImg *librbd.Image, rc io.ReadCloser) error {
rbdImageID := rbdImg.GetName()
currentSnap := rbdImg.GetSnapshot(ImageSnapshotVersion)
if isProtected, err := currentSnap.IsProtected(); err != nil {
if !errors.Is(err, librbd.ErrNotFound) {
return fmt.Errorf("failed to check if snapshot %s is protected: %w", ImageSnapshotVersion, err)
}
} else if isProtected {
log.V(2).Info("Snapshot already exists and is protected, skipping creation and protection.", "snapshotName", ImageSnapshotVersion)
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: Unhandled case when snapshot exists but is not protected.

The current logic doesn't handle the scenario where a snapshot exists but is NOT protected (e.g., if a previous reconciliation crashed after CreateSnapshot but before Protect). In this case:

  1. IsProtected() returns (false, nil)
  2. Code proceeds to populateImage (unnecessary, image already has data)
  3. createSnapshot fails because snapshot already exists

This would cause the reconciliation to fail repeatedly.

🐛 Proposed fix to handle the "exists but not protected" case
 func (r *SnapshotReconciler) prepareSnapshotContent(log logr.Logger, ioCtx *rados.IOContext, rbdImg *librbd.Image, rc io.ReadCloser) error {
 	rbdImageID := rbdImg.GetName()
 	currentSnap := rbdImg.GetSnapshot(ImageSnapshotVersion)
 	if isProtected, err := currentSnap.IsProtected(); err != nil {
 		if !errors.Is(err, librbd.ErrNotFound) {
 			return fmt.Errorf("failed to check if snapshot %s is protected: %w", ImageSnapshotVersion, err)
 		}
+		// Snapshot doesn't exist, continue with population and creation
 	} else if isProtected {
 		log.V(2).Info("Snapshot already exists and is protected, skipping creation and protection.", "snapshotName", ImageSnapshotVersion)
 		return nil
+	} else {
+		// Snapshot exists but not protected - just protect it
+		log.V(2).Info("Snapshot exists but is not protected, protecting it", "snapshotName", ImageSnapshotVersion)
+		if err := currentSnap.Protect(); err != nil {
+			return fmt.Errorf("unable to protect existing snapshot %s: %w", ImageSnapshotVersion, err)
+		}
+		if err := rbdImg.SetSnapshot(ImageSnapshotVersion); err != nil {
+			return fmt.Errorf("failed to set snapshot %s for image %s: %w", ImageSnapshotVersion, rbdImageID, err)
+		}
+		return nil
 	}
 
 	if err := r.populateImage(log, rbdImg, rc); err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controllers/snapshot_controller.go` around lines 414 - 424, In
prepareSnapshotContent detect the "snapshot exists but not protected" path when
currentSnap.IsProtected() returns (false, nil) and handle it by protecting the
existing snapshot instead of proceeding to populateImage/createSnapshot;
specifically, after obtaining currentSnap from
rbdImg.GetSnapshot(ImageSnapshotVersion) and seeing isProtected == false, call
the appropriate Protect method (e.g., currentSnap.Protect() or the reconciler's
snapshot protect helper), check and wrap any errors (treat
already-protected/exists as OK), log the action, and then return nil so
reconciliation does not try to recreate the snapshot; keep error handling
consistent with the surrounding fmt.Errorf wrapping used elsewhere in
prepareSnapshotContent.

@hardikdr hardikdr added the area/storage Storage solutions and related concerns. label Mar 2, 2026
@hardikdr hardikdr added this to Roadmap Mar 2, 2026
@kasabe28 kasabe28 changed the title Add checks to see rbd image/snapshot existence along with image/snapshot store entry Add checks to see rbd image existence along with store entry Mar 2, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controllers/image_controller.go`:
- Around line 272-273: The cleanup flow currently ignores errors from
snapshotExists (called with log, ioCtx, ImageIDToRBDID(snapName), snapName) and
proceeds as if the snapshot check succeeded; change the logic to propagate the
error instead of skipping it — i.e., if snapshotExists returns a non-nil err,
return or surface that error from the surrounding function (or handle it
consistently with existing error handling in this function) so snapshot
creation/removal is not attempted on incomplete information; ensure the call
site uses the err value from snapshotExists and does not treat only the boolean
result as authoritative.
- Around line 749-765: The check that gates parent RBD image existence must be
driven by the snapshot source, not by image.Spec.Image; replace the conditional
that uses image.Spec.Image with logic that calls
getSnapshotSourceDetails(snapshot) to obtain parentName and then call
isRbdImageExisting(ioCtx, SnapshotIDToRBDID(parentName)) (or equivalent RBD id
conversion) to determine existence; keep the same error handling and the
snapshot.Status.State update and r.snapshots.Update(...) flow (including
store.IgnoreErrNotFound(err)) when the parent is missing and
snapshot.Status.State == providerapi.SnapshotStateReady so the reconcile will
mark it failed and trigger creation.

In `@internal/controllers/snapshot_controller.go`:
- Around line 420-422: The current code ignores errors from snapshotExists,
which can lead to dangerous reconciliation when the check fails; update the
conditional around snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion)
to explicitly handle a non-nil err (e.g., log the error and return it or wrap
and return) instead of falling through — keep the early-return when
isSnapshotExist is true, but if err != nil return that error (or a wrapped
error) so reconciliation stops on transient failures.
- Around line 376-378: The existence check is using img.GetID() (store ID)
instead of the real RBD image name and silently ignores errors; change the call
to snapshotExists to pass the actual RBD image name (the "img_<id>" name used
for RBD operations) instead of img.GetID(), and if snapshotExists returns an
error, return that error immediately (do not treat it as non-existence) so
createSnapshot is only attempted when existence check truly succeeds; update the
call site where snapshotExists(log, ioCtx, img.GetID(), snapshot.ID) is invoked
to use the RBD image name and to propagate any error from snapshotExists up
instead of ignoring it (refer to snapshotExists, img.GetID()/RBD image name,
snapshot.ID, and createSnapshot).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91e6049 and b6bce86.

📒 Files selected for processing (3)
  • internal/controllers/common.go
  • internal/controllers/image_controller.go
  • internal/controllers/snapshot_controller.go

Comment on lines +272 to 273
if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(snapName), snapName); err == nil && isSnapshotExist {
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate snapshotExists errors in cleanup flow.

At Line 272, errors are ignored and execution continues. If Ceph lookup fails, the code may attempt snapshot creation/removal decisions on incomplete information.

💡 Proposed fix
-		if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(snapName), snapName); err == nil && isSnapshotExist {
-			continue
-		}
+		if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(snapName), snapName); err != nil {
+			return fmt.Errorf("failed to check snapshot %s existence on cloned image: %w", snapName, err)
+		} else if isSnapshotExist {
+			continue
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controllers/image_controller.go` around lines 272 - 273, The cleanup
flow currently ignores errors from snapshotExists (called with log, ioCtx,
ImageIDToRBDID(snapName), snapName) and proceeds as if the snapshot check
succeeded; change the logic to propagate the error instead of skipping it —
i.e., if snapshotExists returns a non-nil err, return or surface that error from
the surrounding function (or handle it consistently with existing error handling
in this function) so snapshot creation/removal is not attempted on incomplete
information; ensure the call site uses the err value from snapshotExists and
does not treat only the boolean result as authoritative.

Comment on lines +749 to +765
if image.Spec.Image != "" {
log.V(2).Info("Check if snapshot rbd image already exist")
imageExists, err := isRbdImageExisting(ioCtx, SnapshotIDToRBDID(snapshotRef))
if err != nil {
return false, fmt.Errorf("failed to check snapshot rbd image existence: %w", err)
}
log.V(1).Info("Checked snapshot rbd image existence", "imageExists", imageExists)

if !imageExists && snapshot.Status.State == providerapi.SnapshotStateReady {
log.V(1).Info("Snapshot rbd image does not exist. Mark snapshot as failed to trigger rbd image creation in next reconciliation loop")
snapshot.Status.State = providerapi.SnapshotStateFailed
if _, err := r.snapshots.Update(ctx, snapshot); store.IgnoreErrNotFound(err) != nil {
return false, fmt.Errorf("failed to update snapshot to trigger rbd image creation: %w", err)
}
return false, nil
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drive parent-image existence checks from snapshot.Source, not image.Spec.Image.

At Line 749, the check only runs when image.Spec.Image != "". That skips validation for SnapshotRef-driven reconciliations where the snapshot source still requires a parent RBD image. Use getSnapshotSourceDetails(snapshot) and check existence of parentName directly.

💡 Proposed fix
-	if image.Spec.Image != "" {
+	parentName, snapName, err := getSnapshotSourceDetails(snapshot)
+	if err != nil {
+		return false, fmt.Errorf("failed to get snapshot source details: %w", err)
+	}
+
+	if snapshot.Source.IronCoreImage != "" || snapshot.Source.VolumeImageID != "" {
 		log.V(2).Info("Check if snapshot rbd image already exist")
-		imageExists, err := isRbdImageExisting(ioCtx, SnapshotIDToRBDID(snapshotRef))
+		imageExists, err := isRbdImageExisting(ioCtx, parentName)
 		if err != nil {
 			return false, fmt.Errorf("failed to check snapshot rbd image existence: %w", err)
 		}
 		log.V(1).Info("Checked snapshot rbd image existence", "imageExists", imageExists)
 
 		if !imageExists && snapshot.Status.State == providerapi.SnapshotStateReady {
 			log.V(1).Info("Snapshot rbd image does not exist. Mark snapshot as failed to trigger rbd image creation in next reconciliation loop")
 			snapshot.Status.State = providerapi.SnapshotStateFailed
 			if _, err := r.snapshots.Update(ctx, snapshot); store.IgnoreErrNotFound(err) != nil {
 				return false, fmt.Errorf("failed to update snapshot to trigger rbd image creation: %w", err)
 			}
 			return false, nil
 		}
 	}
-
-	parentName, snapName, err := getSnapshotSourceDetails(snapshot)
-	if err != nil {
-		return false, fmt.Errorf("failed to get snapshot source details: %w", err)
-	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controllers/image_controller.go` around lines 749 - 765, The check
that gates parent RBD image existence must be driven by the snapshot source, not
by image.Spec.Image; replace the conditional that uses image.Spec.Image with
logic that calls getSnapshotSourceDetails(snapshot) to obtain parentName and
then call isRbdImageExisting(ioCtx, SnapshotIDToRBDID(parentName)) (or
equivalent RBD id conversion) to determine existence; keep the same error
handling and the snapshot.Status.State update and r.snapshots.Update(...) flow
(including store.IgnoreErrNotFound(err)) when the parent is missing and
snapshot.Status.State == providerapi.SnapshotStateReady so the reconcile will
mark it failed and trigger creation.

Comment on lines +376 to +378
if isSnapshotExist, err := snapshotExists(log, ioCtx, img.GetID(), snapshot.ID); err == nil && isSnapshotExist {
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the RBD image name (img_<id>) and fail fast on existence-check errors.

At Line 376, snapshotExists gets img.GetID() (store ID), but this helper expects the real RBD image name. Combined with ignored errors, this can miss existing snapshots and fall through to createSnapshot, which then fails on already-existing snapshots.

💡 Proposed fix
-	if isSnapshotExist, err := snapshotExists(log, ioCtx, img.GetID(), snapshot.ID); err == nil && isSnapshotExist {
-		return nil
-	}
+	if isSnapshotExist, err := snapshotExists(log, ioCtx, ImageIDToRBDID(img.ID), snapshot.ID); err != nil {
+		return fmt.Errorf("failed to check volume snapshot existence: %w", err)
+	} else if isSnapshotExist {
+		return nil
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controllers/snapshot_controller.go` around lines 376 - 378, The
existence check is using img.GetID() (store ID) instead of the real RBD image
name and silently ignores errors; change the call to snapshotExists to pass the
actual RBD image name (the "img_<id>" name used for RBD operations) instead of
img.GetID(), and if snapshotExists returns an error, return that error
immediately (do not treat it as non-existence) so createSnapshot is only
attempted when existence check truly succeeds; update the call site where
snapshotExists(log, ioCtx, img.GetID(), snapshot.ID) is invoked to use the RBD
image name and to propagate any error from snapshotExists up instead of ignoring
it (refer to snapshotExists, img.GetID()/RBD image name, snapshot.ID, and
createSnapshot).

Comment on lines +420 to 422
if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err == nil && isSnapshotExist {
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not continue when snapshotExists fails.

At Line 420, a non-nil error is ignored, and reconciliation proceeds to populate/create snapshot. If this check fails transiently, you can perform unnecessary writes and hit avoidable failures.

💡 Proposed fix
-	if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err == nil && isSnapshotExist {
-		return nil
-	}
+	if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err != nil {
+		return fmt.Errorf("failed to check ironcore snapshot existence: %w", err)
+	} else if isSnapshotExist {
+		return nil
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err == nil && isSnapshotExist {
return nil
}
if isSnapshotExist, err := snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion); err != nil {
return fmt.Errorf("failed to check ironcore snapshot existence: %w", err)
} else if isSnapshotExist {
return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controllers/snapshot_controller.go` around lines 420 - 422, The
current code ignores errors from snapshotExists, which can lead to dangerous
reconciliation when the check fails; update the conditional around
snapshotExists(log, ioCtx, rbdImageID, ImageSnapshotVersion) to explicitly
handle a non-nil err (e.g., log the error and return it or wrap and return)
instead of falling through — keep the early-return when isSnapshotExist is true,
but if err != nil return that error (or a wrapped error) so reconciliation stops
on transient failures.

@gonzolino gonzolino moved this to In Progress in Storage Mar 5, 2026
@gonzolino gonzolino removed this from Storage Mar 5, 2026
@gonzolino gonzolino added the bug Something isn't working label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage Storage solutions and related concerns. bug Something isn't working size/L

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Reconciliation fails when RBD images or snapshots are missing despite store showing ready state

3 participants