Skip to content

feat(controllers): add reconcile requeue result and speed up role delete finalization#279

Draft
Yuyz0112 wants to merge 1 commit intomainfrom
feat_role_delete_requeue
Draft

feat(controllers): add reconcile requeue result and speed up role delete finalization#279
Yuyz0112 wants to merge 1 commit intomainfrom
feat_role_delete_requeue

Conversation

@Yuyz0112
Copy link
Contributor

@Yuyz0112 Yuyz0112 commented Feb 16, 2026

Issues

  • In polling-only mode (no realtime watch), role deletion requires two full reconcile cycles even on happy path:
    • cycle 1: set status to Deleted
    • cycle 2: final physical delete
  • This introduces avoidable latency for role deletion finalization.
  • The controller runtime also lacked a generic requeue contract for "finish soon" follow-up reconciliation.

Changes

  • Introduced controller-level reconcile result type:
    • controllers/reconcile/result.go
    • type Result { Requeue bool; RequeueAfter time.Duration }
  • Updated Reconciler interface to return (reconcile.Result, error).
  • Updated BaseController.processNextWorkItem to honor:
    • immediate requeue via Requeue
    • delayed requeue via RequeueAfter
  • Aligned all existing controllers with the new signature while preserving behavior (they return empty result by default).
  • Enabled the optimization only for role delete path:
    • RoleController returns RequeueAfter after phase-1 delete status update, so phase-2 final delete is reconciled quickly without waiting next full poll interval.
  • Regenerated controllers/mocks/mock_reconciler.go and adjusted tests for the new return signature.

Why only role uses requeue now

  • This PR targets the concrete latency issue reported on role deletion.
  • Applying requeue to all resources at once would expand behavioral surface and review risk.
  • Other controllers are interface-aligned in this PR, so follow-up per-resource enablement is straightforward and low-friction.

Risk assessment

  • Runtime behavior change is localized to role delete phase-1 -> phase-2 handoff.
  • Requeue semantics are explicit and bounded (RequeueAfter), avoiding global polling interval changes.
  • Potential risk is extra queue churn if misused; current usage is single targeted path.

Rollback plan

  • Fast rollback option: remove role-specific RequeueAfter return in RoleController while keeping interface scaffolding.
  • Full rollback option: revert this PR to restore previous Reconciler signature and processing flow.

Test

  • go test ./controllers/...

@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 70.83333% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
controllers/base_controller.go 33.33% 4 Missing and 4 partials ⚠️
controllers/user_profile_controller.go 0.00% 3 Missing ⚠️
controllers/role_controller.go 66.66% 1 Missing and 1 partial ⚠️
controllers/model_catalog_controller.go 66.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Kubernetes-style reconcile result pattern to enable fine-grained control over requeue behavior in controllers. The main motivation is to speed up role deletion by allowing immediate requeuing after the first phase of deletion (status update), rather than waiting for the full polling cycle.

Changes:

  • Added reconcile.Result type with Requeue and RequeueAfter fields to control requeue behavior
  • Updated the Reconciler interface to return (reconcile.Result, error) instead of just error
  • Enhanced RoleController to return RequeueAfter: 100ms during role deletion phase-1 to accelerate the physical delete

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
controllers/reconcile/result.go New Result struct defining requeue behavior (Requeue bool, RequeueAfter duration)
controllers/base_controller.go Updated processNextWorkItem to handle Result-based requeueing; modified Reconciler interface signature
controllers/role_controller.go Implements fast delete by returning RequeueAfter during deletion phase-1; added roleDeleteRequeueAfter constant (100ms)
controllers/role_controller_test.go Added assertion to verify RequeueAfter is returned during deletion; updated to handle new Reconcile signature
controllers/workspace_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/workspace_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/user_profile_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/model_registry_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/model_registry_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/model_catalog_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/model_catalog_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/image_registry_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/image_registry_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/engine_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/engine_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/endpoint_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/endpoint_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/cluster_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/cluster_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/role_assignment_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/role_assignment_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/api_key_controller.go Updated Reconcile to return reconcile.Result{} with existing sync logic unchanged
controllers/api_key_controller_test.go Updated test to capture Result return value (ignored with _)
controllers/base_controller_test.go Updated mock expectations to return reconcile.Result{} in test cases
controllers/mocks/mock_reconciler.go Regenerated mock to match new Reconciler interface with (Result, error) return signature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +6
type Result struct {
Requeue bool
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reconcile.Result struct allows both Requeue and RequeueAfter to be set simultaneously, but the base controller implementation will only honor RequeueAfter in this case (lines 97-101, 105-109 in base_controller.go). Consider either: (1) documenting this precedence in the Result struct, or (2) adding validation in the Result type to prevent both fields from being set, or (3) returning an error if both are set.

Suggested change
type Result struct {
Requeue bool
// Result captures the outcome of a reconciliation.
//
// If both Requeue and RequeueAfter are set, the base controller implementation
// will honor RequeueAfter and ignore Requeue.
type Result struct {
// Requeue indicates whether to requeue immediately.
//
// Note: If both Requeue and RequeueAfter are set, RequeueAfter takes
// precedence and the controller will honor RequeueAfter.
Requeue bool
// RequeueAfter specifies how long to wait before requeuing.
//
// When non-zero and Requeue is also true, this value takes precedence.

Copilot uses AI. Check for mistakes.

type Result struct {
Requeue bool
RequeueAfter time.Duration
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Result struct doesn't validate that RequeueAfter is non-negative. If a negative duration is provided, the behavior is undefined and may cause unexpected issues with the workqueue. Consider adding validation to ensure RequeueAfter is either zero or positive, or document that negative values are not allowed.

Suggested change
RequeueAfter time.Duration
RequeueAfter time.Duration // must be zero or positive; negative values are not allowed

Copilot uses AI. Check for mistakes.
if result.RequeueAfter > 0 {
bc.queue.AddAfter(key, result.RequeueAfter)
} else if result.Requeue {
bc.queue.AddRateLimited(key)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reconciliation returns an error, the item is not automatically requeued unless a Result flag is set. This means errors without explicit requeue flags will cause the item to be dropped from the queue entirely. The standard pattern should be to always requeue on error (using rate limiting) regardless of the Result flags. Consider adding a fallback to ensure items are requeued with rate limiting when an error occurs and no requeue flags are set.

Suggested change
bc.queue.AddRateLimited(key)
bc.queue.AddRateLimited(key)
} else {
// Ensure items are always retried on error, even if Result does not request requeue.
bc.queue.AddRateLimited(key)

Copilot uses AI. Check for mistakes.
return c.syncHandler(role)
if role.Metadata != nil && role.Metadata.DeletionTimestamp != "" &&
(role.Status == nil || role.Status.Phase != v1.RolePhaseDELETED) {
return reconcile.Result{RequeueAfter: roleDeleteRequeueAfter}, c.syncHandler(role)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the status update to DELETED phase fails in syncHandler (line 70-77), the Reconcile method will return both a RequeueAfter result and an error. With the current base_controller implementation, the RequeueAfter will be honored even though there's an error, which means the retry won't use rate limiting. This could lead to rapid retry loops if the error persists. Consider either: (1) only returning RequeueAfter when syncHandler succeeds, or (2) fixing the base controller to apply rate limiting when both error and RequeueAfter are present.

Suggested change
return reconcile.Result{RequeueAfter: roleDeleteRequeueAfter}, c.syncHandler(role)
err := c.syncHandler(role)
if err != nil {
// On error, do not set RequeueAfter so that the base controller's rate limiting applies.
return reconcile.Result{}, err
}
// On success, requeue after a short delay to continue deletion processing.
return reconcile.Result{RequeueAfter: roleDeleteRequeueAfter}, nil

Copilot uses AI. Check for mistakes.
if role, ok := tt.inputKey.(*v1.Role); ok && role.Metadata != nil &&
role.Metadata.DeletionTimestamp != "" &&
(role.Status == nil || role.Status.Phase != v1.RolePhaseDELETED) {
assert.True(t, result.RequeueAfter > 0)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test assertion on lines 293-297 checks that RequeueAfter is set when a role is being deleted but not yet in DELETED phase. However, this assertion will pass even if RequeueAfter is 0, because it only checks if it's greater than 0. Since roleDeleteRequeueAfter is 100ms, this should work, but the assertion could be more precise by checking that the value equals roleDeleteRequeueAfter. Consider using assert.Equal(t, roleDeleteRequeueAfter, result.RequeueAfter) for more specific validation.

Suggested change
assert.True(t, result.RequeueAfter > 0)
assert.Equal(t, roleDeleteRequeueAfter, result.RequeueAfter)

Copilot uses AI. Check for mistakes.
Comment on lines +293 to +297
if role, ok := tt.inputKey.(*v1.Role); ok && role.Metadata != nil &&
role.Metadata.DeletionTimestamp != "" &&
(role.Status == nil || role.Status.Phase != v1.RolePhaseDELETED) {
assert.True(t, result.RequeueAfter > 0)
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion on lines 293-297 checks for RequeueAfter behavior when a role is being deleted, but none of the test cases in this test function actually use testRoleWithDeletionTimestamp. This means the new RequeueAfter logic is not being tested here. Consider adding at least one test case that uses testRoleWithDeletionTimestamp with a phase other than DELETED to verify that RequeueAfter is properly returned.

Suggested change
if role, ok := tt.inputKey.(*v1.Role); ok && role.Metadata != nil &&
role.Metadata.DeletionTimestamp != "" &&
(role.Status == nil || role.Status.Phase != v1.RolePhaseDELETED) {
assert.True(t, result.RequeueAfter > 0)
}

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +8
type Result struct {
Requeue bool
RequeueAfter time.Duration
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Result struct lacks documentation explaining its purpose and how its fields should be used. Consider adding a comment describing when to use Requeue vs RequeueAfter, and clarifying the behavior when both an error and a Result are returned. For example: "Result controls whether and how a reconciled object should be requeued. When error is nil and Requeue is true, the item is requeued immediately. When error is nil and RequeueAfter is set, the item is requeued after the specified duration. When an error is returned, these fields may influence the requeue behavior depending on the controller implementation."

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants