Skip to content

Conversation

@kolluria
Copy link
Contributor

@kolluria kolluria commented Nov 21, 2025

Summary

This PR introduces a snapshot informer cache to optimize the volume unregistration workflow in the vSphere CSI Driver. The implementation adds an in-memory cache that tracks PVC-to-snapshot relationships using Kubernetes informers, eliminating the need for repeated API server queries during unregistration operations.

Key Changes

1. Snapshot Informer Implementation

  • Added snapshot informer to track VolumeSnapshot resources in real-time
  • Integrated snapshot informer into the informer manager lifecycle
  • Ensures snapshot cache is populated on CSI controller startup

2. PVC-to-Snapshots Mapping (pvcToSnapshotsMap)

  • Implemented in-memory data structure to maintain PVC-to-snapshot relationships
  • Provides O(1) lookup time for snapshot queries
  • Thread-safe operations for concurrent access

3. Event Handlers

  • Add Handler: Populates cache when snapshots are created
  • Delete Handler: Removes snapshots from cache when deleted
  • No Update Handler: Snapshot updates don't affect PVC relationships, so update events are not processed

4. Container Orchestrator Interface

  • Added GetSnapshotsForPVC(pvcName string) method to COCommonInterface
  • Implemented in K8sOrchestrator to query the snapshot cache
  • Returns list of snapshot names associated with a given PVC

5. Unregistration Workflow Enhancement

  • Integrated snapshot cache lookup during volume unregistration
  • Prevents unregistration of volumes with dependent snapshots
  • Provides clear error messages listing blocking snapshots

Performance Benefits

  • Eliminates API Calls: Snapshot information retrieved from in-memory cache instead of querying API server
  • Fast Lookups: O(1) time complexity for PVC-to-snapshots mapping
  • Reduced API Server Load: Decreases load during unregistration operations
  • Real-time Synchronization: Informer automatically keeps cache synchronized with cluster state

Testing

Test Environment

  • Namespace: snapshots
  • Storage Class: wcpglobal-storage-profile
  • Snapshot Class: volumesnapshotclass-delete
  • Test PVCs: 4 PVCs with different snapshot scenarios

1. Snapshot Cache Operations

1.1 Cache Population on Snapshot Creation

Test: Created initial set of snapshots and verified cache population.

Step 2: Creating initial snapshots...
volumesnapshot.snapshot.storage.k8s.io/snapshot-single-1 created
volumesnapshot.snapshot.storage.k8s.io/snapshot-multiple-1 created
volumesnapshot.snapshot.storage.k8s.io/snapshot-multiple-2 created
volumesnapshot.snapshot.storage.k8s.io/snapshot-for-deletion-test created

Waiting for snapshots to be ready...
NAME                         READYTOUSE   SOURCEPVC                     RESTORESIZE   SNAPSHOTCLASS
snapshot-for-deletion-test   true         pvc-for-snapshot-deletion     1Gi           volumesnapshotclass-delete
snapshot-multiple-1          true         pvc-with-multiple-snapshots   2Gi           volumesnapshotclass-delete
snapshot-multiple-2          true         pvc-with-multiple-snapshots   2Gi           volumesnapshotclass-delete
snapshot-single-1            true         pvc-with-single-snapshot      1Gi           volumesnapshotclass-delete

Result: All snapshots successfully created and added to cache.

1.2 Cache Updates with Multiple Snapshots

Test: Created additional snapshots for the same PVC to verify cache handles multiple snapshots per PVC.

Step 4: Creating batch snapshots to test cache updates...
volumesnapshot.snapshot.storage.k8s.io/snapshot-multiple-3 created
volumesnapshot.snapshot.storage.k8s.io/snapshot-multiple-4 created
volumesnapshot.snapshot.storage.k8s.io/snapshot-multiple-5 created

Step 5: Verifying cache has all snapshots for pvc-with-multiple-snapshots...
Expected: 5 snapshots total
snapshot-multiple-1
snapshot-multiple-2
snapshot-multiple-3
snapshot-multiple-4
snapshot-multiple-5

Result: Cache correctly maintains all 5 snapshots for pvc-with-multiple-snapshots.

1.3 Cache Removal on Snapshot Deletion

Test: Deleted a snapshot and verified it was removed from cache.

Step 7: Testing snapshot deletion (cache removal)...
Deleting snapshot-for-deletion-test...
volumesnapshot.snapshot.storage.k8s.io "snapshot-for-deletion-test" deleted

Verifying snapshot is deleted...
Snapshot successfully deleted

Result: Snapshot successfully removed from both cluster and cache.


2. Volume Unregistration Workflow

2.1 Unregistration Blocked by Snapshots

Test: Attempted to unregister a PVC that has multiple snapshots.

Expected: Unregistration should fail with error listing all dependent snapshots.

Step 3: Testing unregistration with snapshots (should fail)...
cnsunregistervolume.cns.vmware.com/unregister-pvc-with-snapshots created

Checking CnsUnregisterVolume status: unregister-pvc-with-snapshots
status:
  error: |-
    volume 719d567c-3925-45ad-ba74-3b695c1e3a5c cannot be unregistered because volume is in use by the following resources:
     Snapshots: snapshot-multiple-3, snapshot-multiple-4, snapshot-multiple-5, snapshot-multiple-1, snapshot-multiple-2
  unregistered: false

Result: Unregistration correctly blocked. Error message lists all 5 snapshots retrieved from cache.

Key Observations:

  • GetSnapshotsForPVC successfully queried cache and returned all 5 snapshots
  • Cache accurately reflected the current cluster state
  • Clear error message helps users understand why unregistration failed

2.2 Unregistration Success Without Snapshots

Test: Attempted to unregister a PVC with no snapshots.

Expected: Unregistration should succeed as cache returns empty list.

Step 6: Testing unregistration without snapshots (should succeed)...
cnsunregistervolume.cns.vmware.com/unregister-pvc-no-snapshots created

Checking CnsUnregisterVolume status: unregister-pvc-no-snapshots
status:
  unregistered: true

Result: Unregistration succeeded. Cache correctly returned empty snapshot list.

2.3 Unregistration Success After Snapshot Deletion

Test: After deleting the blocking snapshot, attempted unregistration again.

Expected: Unregistration should succeed as snapshot was removed from cache.

Step 8: Testing unregistration after snapshot deletion (should succeed)...
cnsunregistervolume.cns.vmware.com/unregister-after-snapshot-delete created

Checking CnsUnregisterVolume status: unregister-after-snapshot-delete
status:
  unregistered: true

Result: Unregistration succeeded after snapshot deletion. Cache properly reflected the deletion.


Test Summary

Final Resource State:

PVCs:
NAME                          STATUS   VOLUME                                     CAPACITY   STORAGECLASS
pvc-with-multiple-snapshots   Bound    pvc-719d567c-3925-45ad-ba74-3b695c1e3a5c   2Gi        wcpglobal-storage-profile
pvc-with-single-snapshot      Bound    pvc-7f772de4-e038-44a4-b6bf-cfcbe0055da0   1Gi        wcpglobal-storage-profile

Snapshots:
NAME                  READYTOUSE   SOURCEPVC                     RESTORESIZE   SNAPSHOTCLASS
snapshot-multiple-1   true         pvc-with-multiple-snapshots   2Gi           volumesnapshotclass-delete
snapshot-multiple-2   true         pvc-with-multiple-snapshots   2Gi           volumesnapshotclass-delete
snapshot-multiple-3   true         pvc-with-multiple-snapshots   2Gi           volumesnapshotclass-delete
snapshot-multiple-4   true         pvc-with-multiple-snapshots   2Gi           volumesnapshotclass-delete
snapshot-multiple-5   true         pvc-with-multiple-snapshots   2Gi           volumesnapshotclass-delete
snapshot-single-1     true         pvc-with-single-snapshot      1Gi           volumesnapshotclass-delete

CnsUnregisterVolume Resources:
NAME                               UNREGISTERED   AGE
unregister-after-snapshot-delete   true           10s
unregister-pvc-no-snapshots        true           31s
unregister-pvc-with-snapshots      false          2m25s

Test Results Summary

Test Case Expected Behavior Actual Result Status
Cache population on snapshot creation Snapshots added to cache 4 snapshots added successfully ✅ Pass
Multiple snapshots per PVC Cache tracks all snapshots All 5 snapshots tracked correctly ✅ Pass
Cache removal on deletion Snapshot removed from cache Snapshot removed successfully ✅ Pass
Unregistration with snapshots Fails with snapshot list Failed with all 5 snapshots listed ✅ Pass
Unregistration without snapshots Succeeds Unregistration successful ✅ Pass
Unregistration after deletion Succeeds Unregistration successful ✅ Pass

Special notes for your reviewer

The existing InformerManager in the vSphere CSI Driver uses a standard Kubernetes SharedInformerFactory which only supports core Kubernetes resources (PVCs, PVs, Pods, etc.). However, VolumeSnapshot is a custom resource defined by the Kubernetes snapshot API group (snapshot.storage.k8s.io), which is not included in the core informer factory.
To support snapshot informers, we introduced externalversions.SharedInformerFactory from the github.com/kubernetes-csi/external-snapshotter/client/v6 package into the InformerManager.

Release note:

Implement Snapshot Informer Cache for Volume Unregistration Workflow

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kolluria

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 21, 2025
@kolluria kolluria changed the title Initial implementation of informer caches for snapshots Enhancement - Use informer cache to detect snapshots during volume unregistration Nov 24, 2025
@kolluria kolluria changed the title Enhancement - Use informer cache to detect snapshots during volume unregistration Enhancement - Implement Snapshot Informer Cache for Volume Unregistration Workflow Nov 25, 2025

func initPVCToSnapshotsMap(ctx context.Context, controllerClusterFlavor cnstypes.CnsClusterFlavor) error {
log := logger.GetLogger(ctx)
// TODO: check if we need to check the FSS as well
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolluria kolluria marked this pull request as ready for review November 25, 2025 09:08
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 25, 2025
@kolluria
Copy link
Contributor Author

/assign @kolluria
/cc @divyenpatel

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #656

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #613

@kolluria
Copy link
Contributor Author

/cc @xing-yang @akankshapanse

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #662

Simplify the getSnapshotsForPVC function by delegating snapshot retrieval
to the ContainerOrchestratorUtility interface instead of directly using
the snapshot client. This improves testability and maintains consistency
with the existing architecture.

Changes:
- Remove direct dependency on external-snapshotter client in util.go
- Update getSnapshotsForPVC to use ContainerOrchestratorUtility.GetSnapshotsForPVC
- Remove rest.Config parameter from getSnapshotsForPVC function signature
- Add comprehensive unit tests for getSnapshotsForPVC covering:
  * Uninitialized ContainerOrchestratorUtility scenario
  * PVC with no snapshots
  * PVC with multiple snapshots
- Implement GetSnapshotsForPVC in FakeK8SOrchestrator for testing
- Add TODO comment for future refactoring of pvcToSnapshotsMap key type

Benefits:
- Improved testability through dependency injection
- Reduced coupling to external snapshot client
- Consistent error handling
- Better alignment with existing orchestrator abstraction pattern
Extracted the snapshot event handlers to improve the testability
Added unit tests for K8s and Informers
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 26, 2025
items: make(map[k8stypes.NamespacedName]map[string]struct{}),
}

err := k8sOrchestratorInstance.informerManager.AddSnapshotListener(ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This informer should be conditionally added. In Vanilla cluster, VolumeSnapshot CRD may not be installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xing-yang great catch! Let me fix that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants