Skip to content

chore: Eliminate unnecessary reconciliations after progressive rollout failure#976

Open
juliev0 wants to merge 5 commits intomainfrom
reduce-reconciliations
Open

chore: Eliminate unnecessary reconciliations after progressive rollout failure#976
juliev0 wants to merge 5 commits intomainfrom
reduce-reconciliations

Conversation

@juliev0
Copy link
Collaborator

@juliev0 juliev0 commented Mar 17, 2026

Fixes #773

Modifications

After we assess a progressive rollout failure, we don't need to be requeing every 20 seconds like we have been. This wastes cpu cycles.

Once there is an update to the rollout spec, we will reconcile again. There's nothing to do in the meantime.

Verification

I performed the following tests with both pipelinerollout, isbservicerollout, and monovertexrollout:

  • Update to a failed spec and observe the rollout fail
  • Observe that the numaplane log stops showing reconciliations
  • Observe that pipelines are correctly scaled during failure
  • Fix the rollout spec and observe that everything works as usual as far as progressive success case
  • Observe that the promoted child is now running at full scale

Backward (and forward) incompatibilities

N/A

juliev0 added 5 commits March 16, 2026 21:54
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Branch-Creation-Time: 2026-03-16T21:52:40+0000
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Signed-off-by: jvogelman <julie_vogelman@intuit.com>

// if we still have interstepbufferservices that need deleting, or if we're in the middle of an upgrade strategy, then requeue
if !allDeleted || inProgressStrategy != apiv1.UpgradeStrategyNoOp {
if !allDeleted || inProgressStrategy == apiv1.UpgradeStrategyPPND {
Copy link
Collaborator Author

@juliev0 juliev0 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of Progressive, we have other logic which is causing us to requeue if we're in the middle of the assessment - we don't need this

requeueDelay = progressiveRequeueDelay
}
if assessmentComplete {
if !failed {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github is making it seem like the code in this block failed, but it hasn't. Only the if done got changed to:

if assessmentComplete {
			if !failed {

}

// requeue using the provided delay
return progressiveRequeueDelay, nil
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requeue if we're mid-assessment

requeueDelay = common.DefaultRequeueDelay

} else {
requeueDelay = progressiveRequeueDelay
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requeueing is taken care of at the top where we now do this:

if progressiveRequeueDelay > 0 {
		requeueDelay = progressiveRequeueDelay
}

@juliev0 juliev0 marked this pull request as ready for review March 17, 2026 20:53
@juliev0 juliev0 requested a review from afugazzotto as a code owner March 17, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid requeueing if we fail progressive upgrade

1 participant