Skip to content

Conversation

skogta
Copy link
Contributor

@skogta skogta commented Aug 6, 2025

What this PR does / why we need it:
Due to the validation being done in webhook, the CPU performance has degraded quite a bit.

To improve this we can do these 2 things:

  1. Do CREATE validation only for user created CRs.
  2. In mutating webhook, add VM name and PVC in the label so that while listing all CRs in the namespace, only the ones with same VM and PVC can be listed.
  3. During CnsFileAccessConfig CR reconciliation, ensure that VM and PVC are not TKG if devops user label is present.

Testing done:

Created by PVCSI:

No lables got added.

Name:         test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm-a6382d7a-f6d0-40c7-94c0-0fe894387de8-10e6cccf-ae34-492e-8ff1-4f8c673130a3
Namespace:    test-gc-e2e-demo-ns
Labels:       <none>
Annotations:  <none>
API Version:  cns.vmware.com/v1alpha1
Kind:         CnsFileAccessConfig
Metadata:
  Creation Timestamp:  2025-08-19T06:16:30Z
  Finalizers:
    cns.vmware.com
  Generation:  2
  Owner References:
    API Version:           vmoperator.vmware.com/v1alpha4
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  VirtualMachine
    Name:                  test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm
    UID:                   432ec2ff-fb2f-46e1-a44c-7adf001c88a8
  Resource Version:        593725
  UID:                     129d6e47-681b-4bf6-8263-b8011c92d7cc
Spec:
  Pvc Name:  a6382d7a-f6d0-40c7-94c0-0fe894387de8-10e6cccf-ae34-492e-8ff1-4f8c673130a3
  Vm Name:   test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm
Status:
  Access Points:
    NFSv3:    host13.cibgst.com:/52ff84e8-92d5-3b0d-757a-d316c4d11331
    NFSv4.1:  host10.cibgst.com:/vsanfs/52ff84e8-92d5-3b0d-757a-d316c4d11331
  Done:       true
Events:
  Type    Reason                        Age   From            Message
  ----    ------                        ----  ----            -------
  Normal  CnsFileAccessConfigSucceeded  12m   cns.vmware.com  Successfully configured access points of VM: "test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm" on the volume: "a6382d7a-f6d0-40c7-94c0-0fe894387de8-10e6cccf-ae34-492e-8ff1-4f8c673130a3"

Created by devops user - valid:

Labels got added.

Name:         rwm-vm-1
Namespace:    test
Labels:       cns.vmware.com/pvc-name=rwm-pvc
              cns.vmware.com/user-created=true
              cns.vmware.com/vm-name=vm-2
Annotations:  <none>
API Version:  cns.vmware.com/v1alpha1
Kind:         CnsFileAccessConfig
Metadata:
  Creation Timestamp:  2025-08-19T06:22:19Z
  Finalizers:
    cns.vmware.com
  Generation:  2
  Owner References:
    API Version:           vmoperator.vmware.com/v1alpha4
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  VirtualMachine
    Name:                  vm-2
    UID:                   03db7e56-22c7-4c39-9aee-494314aa35aa
  Resource Version:        597322
  UID:                     f9a8dc52-c427-4930-b8af-8679914b6ccb
Spec:
  Pvc Name:  rwm-pvc
  Vm Name:   vm-2
Status:
  Access Points:
    NFSv3:    host11.cibgst.com:/52b606f8-0b4d-0952-81ec-56c6233ddafa
    NFSv4.1:  host10.cibgst.com:/vsanfs/52b606f8-0b4d-0952-81ec-56c6233ddafa
  Done:       true
Events:
  Type    Reason                        Age   From            Message
  ----    ------                        ----  ----            -------
  Normal  CnsFileAccessConfigSucceeded  8m6s  cns.vmware.com  Successfully configured access points of VM: "vm-2" on the volume: "rwm-pvc"

Tried to created another CR with different name but same VM and PVC:

Error from server: error when creating "cfc.yaml": admission webhook "validation.csi.vsphere.vmware.com" denied the request: CnsFileAccessConfig rwm-vm-1 already exists for VM vm-2 and PVC rwm-pvc.
Deleted the CR successfully.

Created by devops user with TKG VM:

Labels got added but failure observed during ACL configuration.

Name:         tkg-rwm-vm-1
Namespace:    test-gc-e2e-demo-ns
Labels:       cns.vmware.com/pvc-name=test-example-vanilla-file-pvc
              cns.vmware.com/user-created=true
              cns.vmware.com/vm-name=test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm
Annotations:  <none>
API Version:  cns.vmware.com/v1alpha1
Kind:         CnsFileAccessConfig
Metadata:
  Creation Timestamp:  2025-08-19T06:29:12Z
  Generation:          2
  Resource Version:    601559
  UID:                 157755ea-252b-4797-b135-abcdb826ad30
Spec:
  Pvc Name:  test-example-vanilla-file-pvc
  Vm Name:   test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm
Status:
  Error:  CnsFileAccessConfig is created by devops user and has TKG VM test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm. Invalid combination.
Events:
  Type     Reason                     Age                From            Message
  ----     ------                     ----               ----            -------
  Warning  CnsFileAccessConfigFailed  15s (x5 over 29s)  cns.vmware.com  CnsFileAccessConfig is created by devops user and has TKG VM test-cluster-e2e-script-node-pool-1-b7x5r-vhclw-nf4tm. Invalid combination.

Successful VKS pipline: #3472 (comment)
WPC precheckin pipeline: #3472 (comment) failure is due to bug: 3560225 - unrelated to this change.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: skogta
Once this PR has been reviewed and has the lgtm label, please assign kolluria for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 6, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @skogta. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 6, 2025
@skogta skogta marked this pull request as draft August 6, 2025 13:46
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 6, 2025
@skogta skogta force-pushed the topic/skogta/rwxFilterCfcs branch from 196734c to 176995d Compare August 7, 2025 09:48
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 7, 2025
@skogta skogta marked this pull request as ready for review August 7, 2025 10:23
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 7, 2025
@skogta skogta force-pushed the topic/skogta/rwxFilterCfcs branch from 176995d to af8ab35 Compare August 7, 2025 10:23
@akankshapanse
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2025
@skogta
Copy link
Contributor Author

skogta commented Aug 7, 2025

ok-to-test-wcp

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #55

@skogta
Copy link
Contributor Author

skogta commented Aug 7, 2025

ok-to-test-wcp

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #56

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #57

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #57

@skogta
Copy link
Contributor Author

skogta commented Aug 8, 2025

ok-to-test-wcp

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #58

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #59

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #59

@divyenpatel
Copy link
Member

@skogta In this PR, I see you are adding labels for vmname and pvc name on CNSFileAccessConfig CR instanced if they are created by DevOps. and at the time of creating new CNSFileAccessConfig CR instance, we are doing look up using these labels to ensure duplicate CNSFileAccessConfig CR are not created.

but the goal here is to ensure no duplicate CNSFileAccessConfig CR should be created on the system. With this approach, I think DevOps will able to create duplicate CNSFileAccessConfig CR even if PVCSI has created CNSFileAccessConfig for particular node vm and PVC. Deleting such CR can end up removing ACL for guest node. Can we prevent this from happening?

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #64

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #66

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #69

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #69

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #109

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #109

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #114

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #117

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #117

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #127

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #133

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #133

@skogta skogta force-pushed the topic/skogta/rwxFilterCfcs branch from af8ab35 to 17ecba1 Compare August 14, 2025 08:16
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 14, 2025
@skogta skogta force-pushed the topic/skogta/rwxFilterCfcs branch 4 times, most recently from f20da69 to 48a2c70 Compare August 19, 2025 07:21
@skogta
Copy link
Contributor Author

skogta commented Aug 19, 2025

ok-to-test-wcp

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #145

@skogta
Copy link
Contributor Author

skogta commented Aug 19, 2025

ok-to-test-tkg

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #301

@skogta skogta force-pushed the topic/skogta/rwxFilterCfcs branch from 48a2c70 to 553ef68 Compare August 19, 2025 09:02
@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #146

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #146

@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #1

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants