Skip to content

Conversation

@zach593
Copy link
Contributor

@zach593 zach593 commented Nov 27, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

After #6919 is merged, this PR enables the priority-queue functionality for all controllers that use async workers.
Once all controllers fully adopt priority queues, the karmada-controller-manager can still respond promptly to user updates even when a restart causes a backlog in the workqueue.
In theory, this makes rolling updates of the karmada-controller-manager nearly transparent to users, with no perceived downtime.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

  • Enabling this feature disables the rate-limiting workqueue’s QPS limits.
  • controller-runtime has recently enabled the priority-queue capability by default and plans to release it in v0.23 (see Feature: Priority Queue kubernetes-sigs/controller-runtime#2374). By the time Karmada reaches v1.17, we can consider promoting this capability to a beta feature and enabling it by default.

Test report:

A cluster was created using hack/local-up-karmada.sh. In this cluster, 5,000 Deployments and their corresponding PropagationPolicies were added and propagated to two member clusters, completing full synchronization.

For the controller-manager, a time.Sleep(time.Second) delay was injected at the beginning of each controller’s reconcile loop on top of this PR. The controller-manager was then deployed with the following configuration:

image

After restarting the controller-manager, the following was observed:

img_v3_02se_34facaa4-346e-4727-b9da-e93b3defcd5g

As shown in the screenshots, even while the queue still had a backlog, modifying any Deployment resulted in it being immediately propagated to the member clusters.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: The `ControllerPriorityQueue` feature gate now applies to all controllers, including async workers. Enable it with `--feature-gates=ControllerPriorityQueue=true`.

Copilot AI review requested due to automatic review settings November 27, 2025 14:20
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 27, 2025
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 27, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @zach593, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to Karmada's controller-manager by enabling priority queue functionality for all its asynchronous workers. This strategic upgrade ensures that the system remains highly responsive to user-initiated changes, such as modifying deployments, even under heavy load or during controller restarts. By prioritizing critical events, the controller-manager can maintain perceived zero downtime during rolling updates, leading to a more robust and user-friendly experience. The feature is controlled by a dedicated feature gate, allowing administrators to enable it as needed.

Highlights

  • Universal Priority Queue Adoption: All asynchronous workers across various Karmada controllers (e.g., Cluster API detector, HPA scale target marker, Service Export, Endpoint Slice collection, Work Status, Dependencies Distributor, Resource Detector) have been upgraded to use AsyncPriorityWorker.
  • Feature Gate Enabled: The ControllerPriorityQueue feature gate now controls the activation of this priority queue functionality, allowing for controlled rollout and experimentation.
  • Enhanced Controller Responsiveness: The change significantly improves the controller-manager's ability to respond promptly to user-initiated updates, even when facing substantial queue backlogs or during restarts, by prioritizing critical events.
  • Refined Event Handling: Event processing logic has been updated to correctly assign priorities to items, particularly distinguishing between initial list items and subsequent events, ensuring efficient queue management.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copilot finished reviewing on behalf of zach593 November 27, 2025 14:23
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great step towards improving the responsiveness of the controller-manager by enabling priority queues for all async workers. The changes are applied consistently across various controllers, which involves updating worker types, modifying event handlers to manage item priorities, and adjusting test mocks accordingly. The introduction of the util.ItemPriorityIfInInitialList helper function is a good move for code reuse and consistency. The refactoring in pkg/util/fedinformer/handlers.go to support isInInitialList is also well-executed. Overall, the implementation is clean, correct, and aligns well with the goal of making the system more responsive, especially during high-load scenarios.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables the priority queue feature for all async workers in the controller-manager by updating controllers to use AsyncPriorityWorker and assigning low priority to items from initial list synchronization. This ensures that user-initiated updates remain responsive even during queue backlogs after a restart.

  • Updated all relevant controllers to use AsyncPriorityWorker interface and enable the feature via the ControllerPriorityQueue feature gate
  • Modified event handler signatures to accept isInInitialList parameter and assign low priority to initial sync items
  • Removed deprecated NewHandlerOnAllEvents function in favor of NewHandlerOnEvents with updated signature

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/util/worker.go Added ItemPriorityIfInInitialList helper function and reordered switch case logic for clarity
pkg/util/fedinformer/handlers.go Updated NewHandlerOnEvents to use ResourceEventHandlerDetailedFuncs and removed deprecated NewHandlerOnAllEvents
pkg/util/fedinformer/handlers_test.go Updated test signatures to match new handler interface and removed obsolete test for deleted function
pkg/resourceinterpreter/customized/webhook/configmanager/manager.go Updated add handler signature to include isInInitialList parameter
pkg/resourceinterpreter/customized/declarative/configmanager/manager.go Updated add handler signature to include isInInitialList parameter
pkg/detector/detector.go Changed workers to AsyncPriorityWorker type and updated add handlers to use priority queue with feature gate
pkg/detector/detector_test.go Updated test mocks to implement new AddWithOpts and EnqueueWithOpts methods
pkg/detector/preemption_test.go Added mock implementations for new priority queue methods
pkg/dependenciesdistributor/dependencies_distributor.go Changed worker to AsyncPriorityWorker and updated event handlers to use priority queue
pkg/dependenciesdistributor/dependencies_distributor_test.go Added mock implementations for new priority queue methods
pkg/controllers/status/work_status_controller.go Changed worker to AsyncPriorityWorker, added individual event handlers with priority support
pkg/controllers/multiclusterservice/endpointslice_collect_controller.go Changed worker to AsyncPriorityWorker and updated add handler to use priority queue
pkg/controllers/multiclusterservice/endpointslice_collect_controller_test.go Updated test to match new handler type and added mock implementations
pkg/controllers/mcs/service_export_controller.go Changed worker to AsyncPriorityWorker and updated add handler to use priority queue
pkg/controllers/hpascaletargetmarker/hpa_scale_target_marker_controller.go Changed worker to AsyncPriorityWorker with feature gate support
pkg/controllers/hpascaletargetmarker/hpa_scale_target_marker_predicate.go Updated predicate to use priority queue for initial list items
pkg/clusterdiscovery/clusterapi/clusterapi.go Changed worker to AsyncPriorityWorker and updated event handlers with priority support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link

codecov-commenter commented Nov 27, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 48.68421% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.60%. Comparing base (70aa89b) to head (003554e).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
pkg/clusterdiscovery/clusterapi/clusterapi.go 0.00% 9 Missing ⚠️
pkg/detector/detector.go 20.00% 8 Missing ⚠️
pkg/controllers/mcs/service_export_controller.go 12.50% 6 Missing and 1 partial ⚠️
...clusterservice/endpointslice_collect_controller.go 50.00% 4 Missing ⚠️
pkg/util/worker.go 33.33% 4 Missing ⚠️
...targetmarker/hpa_scale_target_marker_controller.go 0.00% 3 Missing ⚠️
...etargetmarker/hpa_scale_target_marker_predicate.go 0.00% 2 Missing ⚠️
...ependenciesdistributor/dependencies_distributor.go 83.33% 1 Missing ⚠️
...preter/customized/webhook/configmanager/manager.go 0.00% 0 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6965      +/-   ##
==========================================
- Coverage   46.61%   46.60%   -0.02%     
==========================================
  Files         698      698              
  Lines       48003    48017      +14     
==========================================
- Hits        22378    22376       -2     
- Misses      23940    23957      +17     
+ Partials     1685     1684       -1     
Flag Coverage Δ
unittests 46.60% <48.68%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot~
/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2025
@RainbowMango RainbowMango added this to the v1.16 milestone Nov 29, 2025
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2025
@karmada-bot karmada-bot merged commit 17c07aa into karmada-io:master Nov 29, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants