Skip to content

Group snapshot not taken on conflict #1280

@jsafrane

Description

@jsafrane

What happened:

I do not know how exactly it happened, but I got into a situation when the snapshot controller was processing a VolumeGroupSnapshot and corresponding VolumeSnapshot already existed. The controller reacts to 409 Already Exists here:

createdVolumeSnapshot, err := ctrl.clientset.SnapshotV1().VolumeSnapshots(volumeSnapshotNamespace).Create(ctx, volumeSnapshot, metav1.CreateOptions{})
if err != nil && !apierrs.IsAlreadyExists(err) {
return groupSnapshotContent, fmt.Errorf(
"createSnapshotsForGroupSnapshotContent: creating volumesnapshot %w", err)
}

You can see the code continues processing when the snapshot already exists. However, the createdVolumeSnapshot variable has undefined content (it points to an object with empty namespace / name in my case) and later on, when the controller tries to update it, Patch() fails here:

_, err = utils.PatchVolumeSnapshot(createdVolumeSnapshot, []utils.PatchOp{

Error:
E0310 21:18:31.250917 1 groupsnapshot_controller_helper.go:257] could not sync group snapshot "e2e-volumegroupsnapshottable-8755/group-snapshot-z6w49": createSnapshotsForGroupSnapshotContent: binding volumesnapshot to volumesnapshotcontent resource name may not be empty

What you expected to happen:
The controller should perhaps Get() the VolumeSnapshot from its informer before creating it. And if it already exists, then fail + expect the VolumeSnapshot appears in the informer in the next retry.

All IsAlreadyExists should be then handled in a similar way, not just VolumeSnapshot.

How to reproduce it:
I don't know. It happened only once in OpenShift e2e tests. I am stress-testing the test, 1024 runs so far, 0 failures.

Logs from OpenShift e2e tests:

As you can see, it would be helpful to log names of created VolumeSnapshots, VolumeSnapshotContents, and VolumeGroupContents - it's quite hard to map which VolumeSnapshot failed to be patched and what's the corresponding VGSC.

Anything else we need to know?:

Environment:

  • Driver version: csi-driver-hostpath + e2e tests as in Kubernetes 1.32
  • Kubernetes version (use kubectl version): 1.32-ish

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions