Skip to content

Conversation

@jwtty
Copy link
Contributor

@jwtty jwtty commented Jan 17, 2025

Description of your changes

  1. Show index in PolicySnapshotIndexObserved in updateRun status
  2. Accept index in ResourceSnapshotIndex in updateRun spec
  3. Add ApprovalAccepted status in approvalRequests
  4. Fix the waitTime flaky UT (use no earlier than instead of after)
  5. Fix observedClusterCount difference for pickAll policy in validation.

Fixes #

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Special notes for your reviewer

@jwtty jwtty changed the title fix: use index instead of name in PolicySnapshotIndexUsed and resourceSnapshotIndex fix: use index in PolicySnapshotIndexUsed and resourceSnapshotIndex Jan 17, 2025
@jwtty jwtty force-pushed the stagerun-index-fix branch 3 times, most recently from 2843cee to 79a44b9 Compare January 21, 2025 23:01
@jwtty jwtty changed the title fix: use index in PolicySnapshotIndexUsed and resourceSnapshotIndex fix: use index number in clusterStagedUpdateRun and add ApprovalAccepted status to ApprovalRequests Jan 22, 2025
@jwtty jwtty force-pushed the stagerun-index-fix branch 6 times, most recently from dd0aad0 to fe9aefe Compare January 23, 2025 22:59
GenericFunc: func(ctx context.Context, e event.GenericEvent, q workqueue.RateLimitingInterface) {
klog.V(2).InfoS("Handling a clusterApprovalRequest generic event", "clusterApprovalRequest", klog.KObj(e.Object))
handleClusterApprovalRequest(e.Object, q)
handleClusterApprovalRequest(e.ObjectOld, e.ObjectNew, q)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, I wonder why removing the genricFunc? I put it there just for safety as I am not 100% sure what is considered "generic".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genericFunc handles genericEvents, whose operation is unknown, e.g. triggered by a timer or events outside of the cluster: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/event#GenericEvent. Since I only want to trigger reconcile upon approval status change, I removed the genericFunc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is "is it guaranteed that the status (and in other cases, the spec) of an object is not changed when the operation is unknown"? I haven't spent enough time in the code base to figure that out.

// Make sure the cluster count in the policy snapshot has not changed.
if updateRun.Status.PolicyObservedClusterCount != clusterCount {
// PickAll policy case will be verified in validateStagesStatus.
if clusterCount != -1 && updateRun.Status.PolicyObservedClusterCount != clusterCount {
Copy link
Contributor

@ryanzhang-oss ryanzhang-oss Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding the extra check since both will be -1 in the case of pickAll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a regression introduced in PR #1014 . For pickAll policy, determinePolicySnapshot function returns -1 as clusterCount and the policyObservedClusterCount is updated to the total of scheduled bindings in collectScheduledBindings. So for the clusterCount validation here, I ignore the case where clusterCount == -1. The clustercount validation will be done later in the stage validation. Without this change, updateRun validation would fail with clusterCount mismatch error: "the cluster count initialized in the clusterStagedUpdateRun is outdated, latest: -1, recorded: 3 (total cluster count)"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal to assign values to PolicyObservedClusterCount in both determinePolicySnapshot and collectScheduledClusters functions. It will be good to add some comments (and maybe a TODO to refactor)

@jwtty jwtty force-pushed the stagerun-index-fix branch from fe9aefe to a38fb75 Compare February 7, 2025 05:15
// - "True": The request is approved.
ApprovalRequestConditionApproved ApprovalRequestConditionType = "Approved"

// ApprovalRequestConditionApprovalAccepted indicates if the approved approval request was accepted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    - jsonPath: .status.conditions[?(@.type=="ApprovalAccepted")].status
      name: ApprovalAccepted
      type: string

GenericFunc: func(ctx context.Context, e event.GenericEvent, q workqueue.RateLimitingInterface) {
klog.V(2).InfoS("Handling a clusterApprovalRequest generic event", "clusterApprovalRequest", klog.KObj(e.Object))
handleClusterApprovalRequest(e.Object, q)
handleClusterApprovalRequest(e.ObjectOld, e.ObjectNew, q)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is "is it guaranteed that the status (and in other cases, the spec) of an object is not changed when the operation is unknown"? I haven't spent enough time in the code base to figure that out.

// Make sure the cluster count in the policy snapshot has not changed.
if updateRun.Status.PolicyObservedClusterCount != clusterCount {
// PickAll policy case will be verified in validateStagesStatus.
if clusterCount != -1 && updateRun.Status.PolicyObservedClusterCount != clusterCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal to assign values to PolicyObservedClusterCount in both determinePolicySnapshot and collectScheduledClusters functions. It will be good to add some comments (and maybe a TODO to refactor)

@jwtty jwtty merged commit 8ede423 into Azure:main Feb 11, 2025
12 checks passed
@jwtty jwtty deleted the stagerun-index-fix branch February 11, 2025 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants