-
Notifications
You must be signed in to change notification settings - Fork 143
fix uncertain cache #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix uncertain cache #512
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huww98 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @huww98. Thanks for your PR. I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
||
go controller.Run(1, t.Context()) | ||
|
||
/* Add initial objects to informer caches */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to add them manually, as informer will read them from kubeClient
. ctrl.init(ctx)
will wait for cache sync.
ctrl.removePVCFromModifyVolumeUncertainCache(pvcKey) | ||
ctrl.markForSlowRetry(pvc, pvcKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done by caller, no need to do it again.
// TODO: This case is flaky due to fake client lacks resourceVersion support. | ||
// { | ||
// name: "success", | ||
// waitCount: 10, | ||
// }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a PVC is marked as completed, it is re-queued. In the next cycle, we may still read out the PVC with InProgress status. Then we should have conflict when trying to mark it InProgress again. However, fake client does not supports resourceVersion. So when we pass an empty patch to it, it returns the latest PVC, with ModifyVolumeStatus == nil
, which will make markControllerModifyVolumeCompleted
panic.
Should we somehow make metadata.generation
work for PVC? So that we can cache the already synced generation+VACName, and will know the PVC in the cache is outdated, and just skip that cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand how it works, the metadata.generation field in a Kubernetes object is a sequence number that is only incremented when there is a change to the object's spec.
But ModifyVolumeStatus is in Status.ModifyVolumeStatus, therefore, caching the generation number would be ineffective? It wouldn't tell you if the status has been updated, so you would still be working with stale data?
Also Hemant raised an issue: #514 that we will fix before GA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan is: we maintain a global sync.Map which maps from object UID to the generation that is has been fully reconciled. We add to this map when ControllerModifyVolume finishes with OK and specVacName matches targetVACName. We clear it when the object is deleted. When the next cycle starts, we just skip if the generation matches. This should ensure we will not sync the same VAC at the same generation twice.
Anyway, I think this is minor and should not block this PR.
Also Hemant raised an issue: #514 that we will fix before GA.
Yes, I think this PR should fix that one.
/ok-to-test |
Fixed the test. The test is broken during rebase, it is originally developed based on #513 |
fix wrong error is used when checking final error. map is not safe for concurrent read/write from multiple worker. Also fixed that. Added related tests.
For the uncertain cache fix and change it to use sync.Map /lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix wrong error is used when checking final error.
map is not safe for concurrent read/write from multiple worker. Also fixed that.
Added related tests.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: