feat: introduce asyncPriorityWorker in resourceDetector Processor #6919

zach593 · 2025-11-09T13:39:24Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

In the previous work, we added the priority queue featuregate for controllers driven by controller-runtime.
However, this capability could not take full effect, because some key controllers in controller-manager are driven by our self-developed asyncWorker.
Therefore, we need to implement the same mechanism in asyncWorker to achieve full coverage of the "priority queue" feature.

This PR adds priority queue support to asyncWorker and introduces the related capability in resourceDetector for processing resourceTemplate (activated via the priorityQueue featuregate).

This feature will be progressively rolled out to other controllers in upcoming updates.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

test report:

I temporarily added time.Sleep(time.Second) in the detector and limited its concurrency to 1 to control the processing speed.
After adding 5,000 deployments and their corresponding propagation policies and completing synchronization, I restarted the controller.

It can be observed that while there was still a backlog in the queue, after manually modifying the corresponding object, the detector processed it very quickly.
As for the backlog of the priority: 0 queue, it shouldn’t be that large — I suspect there might be an issue here, and I’ll check it later.

Does this PR introduce a user-facing change?:

After enabling the `ControllerPriorityQueue` feature gate, the asyncWorker uses a priority queue based implementation, which affects the processing order of items in the resource detector and causes the monitoring metric `workqueue_depth` to be split into multiple series.

gemini-code-assist · 2025-11-09T13:39:45Z

Summary of Changes

Hello @zach593, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the asyncWorker component by integrating priority queue capabilities, which is particularly beneficial for controllers that do not utilize controller-runtime's native priority queue. The primary goal is to improve the responsiveness of the system by ensuring that critical, user-initiated changes are processed ahead of lower-priority tasks, especially during periods of high load or controller restarts. This is achieved by updating the ResourceDetector to use the new priority-aware worker, along with minor adjustments to rate limiter option handling and event processing, all conditional on the ControllerPriorityQueue feature gate.

Highlights

Priority Queue for AsyncWorker: Introduced priority queue support for the asyncWorker component, enabling prioritized processing of items based on their importance. This is activated via the ControllerPriorityQueue feature gate.
ResourceDetector Integration: The ResourceDetector now leverages the new asyncPriorityWorker for processing resourceTemplate items. This ensures that user-triggered changes are processed with higher priority, especially during initial list processing or policy changes.
Rate Limiter Options Refinement: A new SetDefaults method was added to the ratelimiterflag.Options struct, centralizing the logic for setting default values for rate limiter configurations.
Enhanced Event Handling: The NewFilteringHandlerOnAllEvents function in fedinformer now uses cache.ResourceEventHandlerDetailedFuncs and the AddFunc signature has been updated to include an isInitialList parameter, allowing for differentiated handling of initial object additions.
Comprehensive Testing: New test cases have been added for the asyncPriorityWorker to validate its enqueue, add-after, and run functionalities, including scenarios for different priorities and time delays.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces priority queue support for the self-developed asyncWorker and enables it in the resourceDetector to improve the responsiveness of controllers. The changes are well-structured, introducing a new AsyncPriorityWorker interface and conditionally using a priority queue based on a feature gate. My review focuses on improving test mocks for better verification, ensuring consistent rate-limiter behavior, and enhancing the robustness of the new worker logic.

pkg/detector/detector_test.go

pkg/util/worker.go

Copilot

Pull Request Overview

This PR introduces priority queue support for async workers in Karmada controllers, controlled by the ControllerPriorityQueue feature gate. The main purpose is to allow controllers to prioritize processing recent cluster changes during startup while deferring items without recent updates to the end of the queue.

Key changes:

Added priority queue support to the async worker implementation with new AsyncPriorityWorker interface and AddWithOpts/EnqueueWithOpts methods
Integrated priority queue into ResourceDetector to assign low priority to items from initial list sync and policy changes
Extracted rate limiter default setting logic into a SetDefaults() method for reusability

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/util/worker.go	Added priority queue support with new interfaces, methods, and conditional queue initialization based on `UsePriorityQueue` option
pkg/util/worker_test.go	Added comprehensive tests for priority queue functionality including new test cases for `AddWithOpts` and `EnqueueWithOpts`
pkg/detector/detector.go	Integrated priority queue feature with low priority assignment for initial list items and policy-triggered changes
pkg/detector/detector_test.go	Updated test mocks to implement new `AsyncPriorityWorker` interface methods
pkg/util/fedinformer/handlers.go	Updated handler signature to support `isInitialList` parameter for detecting initial sync events
pkg/util/fedinformer/handlers_test.go	Updated test to match new handler signature
pkg/sharedcli/ratelimiterflag/ratelimiterflag.go	Extracted default setting logic into `SetDefaults()` method
pkg/sharedcli/ratelimiterflag/ratelimiterflag_test.go	Added tests for the new `SetDefaults()` method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/util/worker.go

pkg/util/worker_test.go

codecov-commenter · 2025-11-09T13:55:57Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 77.61194% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.51%. Comparing base (463fcca) to head (890699c).
⚠️ Report is 10 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/util/worker.go	73.33%	9 Missing and 3 partials ⚠️
pkg/sharedcli/ratelimiterflag/ratelimiterflag.go	83.33%	2 Missing ⚠️
pkg/detector/detector.go	88.88%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6919      +/-   ##
==========================================
+ Coverage   46.46%   46.51%   +0.05%     
==========================================
  Files         698      698              
  Lines       47824    47876      +52     
==========================================
+ Hits        22220    22269      +49     
  Misses      23933    23933              
- Partials     1671     1674       +3

Flag	Coverage Δ
unittests	`46.51% <77.61%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

RainbowMango · 2025-11-14T09:40:38Z

As for the backlog of the priority: 0 queue, it shouldn’t be that large — I suspect there might be an issue here, and I’ll check it later.

Yes, that's suspicious. You just updated one resource template, right? Can we add a log before enqueuing the items? So that we can know what items are in the queue.

zach593 · 2025-11-23T15:36:42Z

It can be observed that while there was still a backlog in the queue, after manually modifying the corresponding object, the detector processed it very quickly.
As for the backlog of the priority: 0 queue, it shouldn’t be that large — I suspect there might be an issue here, and I’ll check it later.

Reason:

After creating the cluster using hack/local-up-karmada.sh, karmada-apiserver will contain some basic resources that support the cluster, such as CRDs and APIServices. These resources will enter the resourceDetector queue. However, when entering reconcile, they lack corresponding PP or CPP, resulting in an error and a second entry into the priority queue with a ratelimit option, this time with a priority of 0.

karmada/pkg/detector/policy.go

Lines 105 to 107 in 463fcca

    
           	d.AddWaiting(objectKey) 
        
           	return fmt.Errorf("no matched propagation policy") 
        
           }

The ratelimit option is implemented by calling When and adding readyAt.

karmada/vendor/sigs.k8s.io/controller-runtime/pkg/controller/priorityqueue/priorityqueue.go

Lines 144 to 156 in 463fcca

    
           after := o.After 
        
           if o.RateLimited { 
        
           	rlAfter := w.rateLimiter.When(key) 
        
           	if after == 0 || rlAfter < after { 
        
           		after = rlAfter 
        
           	} 
        
           } 
        
           var readyAt *time.Time 
        
           if after > 0 { 
        
           	readyAt = ptr.To(w.now().Add(after)) 
        
           	w.metrics.retry() 
        
           }

As seen in the less implementation, objects with readyAt are placed at the back of the queue, even if they might have higher priority. (before #6929)

karmada/vendor/sigs.k8s.io/controller-runtime/pkg/controller/priorityqueue/priorityqueue.go

Lines 367 to 382 in 27ca690

    
           func less[T comparable](a, b *item[T]) bool { 
        
           	if a.ReadyAt == nil && b.ReadyAt != nil { 
        
           		return true 
        
           	} 
        
           	if b.ReadyAt == nil && a.ReadyAt != nil { 
        
           		return false 
        
           	} 
        
           	if a.ReadyAt != nil && b.ReadyAt != nil && !a.ReadyAt.Equal(*b.ReadyAt) { 
        
           		return a.ReadyAt.Before(*b.ReadyAt) 
        
           	} 
        
           	if a.Priority != b.Priority { 
        
           		return a.Priority > b.Priority 
        
           	} 
        
           	return a.AddedCounter < b.AddedCounter 
        
           }

Therefore, although these items are placed at the end of the queue, their priority is 0. This results in a situation where, after modifying an object, it is immediately processed by the detector, but the queue metrics still appear to show a backlog of items with a priority of 0.

After #6929, this has already been fixed in the code shown below. After rebasing, the problem has not occurred again.

karmada/vendor/sigs.k8s.io/controller-runtime/pkg/controller/priorityqueue/priorityqueue.go

Lines 421 to 436 in 463fcca

    
           func less[T comparable](a, b *item[T]) bool { 
        
           	if a.Priority != b.Priority { 
        
           		return a.Priority > b.Priority 
        
           	} 
        
           	if a.ReadyAt == nil && b.ReadyAt != nil { 
        
           		return true 
        
           	} 
        
           	if b.ReadyAt == nil && a.ReadyAt != nil { 
        
           		return false 
        
           	} 
        
           	if a.ReadyAt != nil && b.ReadyAt != nil && !a.ReadyAt.Equal(*b.ReadyAt) { 
        
           		return a.ReadyAt.Before(*b.ReadyAt) 
        
           	} 
        
           	return a.AddedCounter < b.AddedCounter 
        
           }

Signed-off-by: zach593 <[email protected]>

zach593 · 2025-11-23T16:34:30Z

Updated Test Report:

karmada-controller-manager configuration:

Add a time.Sleep(time.Second) before logging in the resourceDetector:

After adding 5,000 deployments and their propagation policies and waiting for synchronization to complete, restart the controller.

And the result:

zach593 · 2025-11-23T16:39:04Z

/retest

zach593 · 2025-11-24T02:04:38Z

/retest

RainbowMango · 2025-11-24T07:22:15Z

After #6929, this has already been fixed in the code shown below. After rebasing, the problem has not occurred again.

Nice!

pkg/util/worker.go

pkg/detector/detector.go

pkg/util/worker.go

Signed-off-by: zach593 <[email protected]>

RainbowMango

/lgtm
/gemini review

RainbowMango · 2025-11-25T01:48:52Z

>>> git ls-remote --get-url origin
Error: failed linting charts: failed processing charts

------------------------------------------------------------------------------------------------------------------------
 ✖︎ karmada => (version: "0.0.0", path: "charts/karmada") > failed validating maintainer "chaosi-zju": 503 Service Unavailable
------------------------------------------------------------------------------------------------------------------------
failed linting charts: failed processing charts
Error: Process completed with exit code 1.

/retest

RainbowMango

/approve

karmada-bot · 2025-11-25T02:02:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI review requested due to automatic review settings November 9, 2025 13:39

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 9, 2025

karmada-bot requested review from XiShanYongYe-Chang and whitewindmills November 9, 2025 13:39

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 9, 2025

gemini-code-assist bot reviewed Nov 9, 2025

View reviewed changes

pkg/detector/detector_test.go Show resolved Hide resolved

pkg/util/worker.go Show resolved Hide resolved

pkg/util/worker.go Show resolved Hide resolved

Copilot AI reviewed Nov 9, 2025

View reviewed changes

pkg/util/worker.go Show resolved Hide resolved

pkg/util/worker_test.go Outdated Show resolved Hide resolved

pkg/util/worker_test.go Outdated Show resolved Hide resolved

zach593 force-pushed the asyncworker-pq branch 4 times, most recently from 1e32269 to ca97883 Compare November 13, 2025 01:25

zach593 force-pushed the asyncworker-pq branch from ca97883 to c8c3a1f Compare November 23, 2025 16:10

feat: introduce asyncPriorityWorker in resourceDetector Processor

afd04e9

Signed-off-by: zach593 <[email protected]>

zach593 force-pushed the asyncworker-pq branch from c8c3a1f to afd04e9 Compare November 23, 2025 16:16

RainbowMango reviewed Nov 24, 2025

View reviewed changes

pkg/util/worker.go Show resolved Hide resolved

pkg/detector/detector.go Show resolved Hide resolved

pkg/util/worker.go Outdated Show resolved Hide resolved

update asyncPriorityQueue AddWithOpts() implementation

890699c

Signed-off-by: zach593 <[email protected]>

RainbowMango reviewed Nov 25, 2025

View reviewed changes

karmada-bot assigned RainbowMango Nov 25, 2025

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2025

RainbowMango added this to the v1.16 milestone Nov 25, 2025

RainbowMango added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 25, 2025

RainbowMango approved these changes Nov 25, 2025

View reviewed changes

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 25, 2025

karmada-bot merged commit d7c6633 into karmada-io:master Nov 25, 2025
26 of 27 checks passed

zach593 mentioned this pull request Nov 27, 2025

enable asyncPriorityWorker in all controllers #6965

Merged

feat: introduce asyncPriorityWorker in resourceDetector Processor #6919

feat: introduce asyncPriorityWorker in resourceDetector Processor #6919

Uh oh!

Conversation

zach593 commented Nov 9, 2025

Uh oh!

gemini-code-assist bot commented Nov 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

RainbowMango commented Nov 14, 2025

Uh oh!

zach593 commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zach593 commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zach593 commented Nov 23, 2025

Uh oh!

zach593 commented Nov 24, 2025

Uh oh!

RainbowMango commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RainbowMango left a comment

Choose a reason for hiding this comment

Uh oh!

RainbowMango commented Nov 25, 2025

Uh oh!

RainbowMango left a comment

Choose a reason for hiding this comment

Uh oh!

karmada-bot commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Nov 9, 2025 •

edited

Loading

zach593 commented Nov 23, 2025 •

edited

Loading

zach593 commented Nov 23, 2025 •

edited

Loading