Skip to content

Conversation

@zach593
Copy link
Contributor

@zach593 zach593 commented Nov 9, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

In the previous work, we added the priority queue featuregate for controllers driven by controller-runtime.
However, this capability could not take full effect, because some key controllers in controller-manager are driven by our self-developed asyncWorker.
Therefore, we need to implement the same mechanism in asyncWorker to achieve full coverage of the "priority queue" feature.

This PR adds priority queue support to asyncWorker and introduces the related capability in resourceDetector for processing resourceTemplate (activated via the priorityQueue featuregate).

This feature will be progressively rolled out to other controllers in upcoming updates.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

test report:

I temporarily added time.Sleep(time.Second) in the detector and limited its concurrency to 1 to control the processing speed.
After adding 5,000 deployments and their corresponding propagation policies and completing synchronization, I restarted the controller.

image image

It can be observed that while there was still a backlog in the queue, after manually modifying the corresponding object, the detector processed it very quickly.
As for the backlog of the priority: 0 queue, it shouldn’t be that large — I suspect there might be an issue here, and I’ll check it later.

Does this PR introduce a user-facing change?:

After enabling the `ControllerPriorityQueue` feature gate, the asyncWorker uses a priority queue based implementation, which affects the processing order of items in the resource detector and causes the monitoring metric `workqueue_depth` to be split into multiple series.

Copilot AI review requested due to automatic review settings November 9, 2025 13:39
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 9, 2025
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 9, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @zach593, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the asyncWorker component by integrating priority queue capabilities, which is particularly beneficial for controllers that do not utilize controller-runtime's native priority queue. The primary goal is to improve the responsiveness of the system by ensuring that critical, user-initiated changes are processed ahead of lower-priority tasks, especially during periods of high load or controller restarts. This is achieved by updating the ResourceDetector to use the new priority-aware worker, along with minor adjustments to rate limiter option handling and event processing, all conditional on the ControllerPriorityQueue feature gate.

Highlights

  • Priority Queue for AsyncWorker: Introduced priority queue support for the asyncWorker component, enabling prioritized processing of items based on their importance. This is activated via the ControllerPriorityQueue feature gate.
  • ResourceDetector Integration: The ResourceDetector now leverages the new asyncPriorityWorker for processing resourceTemplate items. This ensures that user-triggered changes are processed with higher priority, especially during initial list processing or policy changes.
  • Rate Limiter Options Refinement: A new SetDefaults method was added to the ratelimiterflag.Options struct, centralizing the logic for setting default values for rate limiter configurations.
  • Enhanced Event Handling: The NewFilteringHandlerOnAllEvents function in fedinformer now uses cache.ResourceEventHandlerDetailedFuncs and the AddFunc signature has been updated to include an isInitialList parameter, allowing for differentiated handling of initial object additions.
  • Comprehensive Testing: New test cases have been added for the asyncPriorityWorker to validate its enqueue, add-after, and run functionalities, including scenarios for different priorities and time delays.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces priority queue support for the self-developed asyncWorker and enables it in the resourceDetector to improve the responsiveness of controllers. The changes are well-structured, introducing a new AsyncPriorityWorker interface and conditionally using a priority queue based on a feature gate. My review focuses on improving test mocks for better verification, ensuring consistent rate-limiter behavior, and enhancing the robustness of the new worker logic.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces priority queue support for async workers in Karmada controllers, controlled by the ControllerPriorityQueue feature gate. The main purpose is to allow controllers to prioritize processing recent cluster changes during startup while deferring items without recent updates to the end of the queue.

Key changes:

  • Added priority queue support to the async worker implementation with new AsyncPriorityWorker interface and AddWithOpts/EnqueueWithOpts methods
  • Integrated priority queue into ResourceDetector to assign low priority to items from initial list sync and policy changes
  • Extracted rate limiter default setting logic into a SetDefaults() method for reusability

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/util/worker.go Added priority queue support with new interfaces, methods, and conditional queue initialization based on UsePriorityQueue option
pkg/util/worker_test.go Added comprehensive tests for priority queue functionality including new test cases for AddWithOpts and EnqueueWithOpts
pkg/detector/detector.go Integrated priority queue feature with low priority assignment for initial list items and policy-triggered changes
pkg/detector/detector_test.go Updated test mocks to implement new AsyncPriorityWorker interface methods
pkg/util/fedinformer/handlers.go Updated handler signature to support isInitialList parameter for detecting initial sync events
pkg/util/fedinformer/handlers_test.go Updated test to match new handler signature
pkg/sharedcli/ratelimiterflag/ratelimiterflag.go Extracted default setting logic into SetDefaults() method
pkg/sharedcli/ratelimiterflag/ratelimiterflag_test.go Added tests for the new SetDefaults() method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link

codecov-commenter commented Nov 9, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 77.61194% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.51%. Comparing base (463fcca) to head (890699c).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
pkg/util/worker.go 73.33% 9 Missing and 3 partials ⚠️
pkg/sharedcli/ratelimiterflag/ratelimiterflag.go 83.33% 2 Missing ⚠️
pkg/detector/detector.go 88.88% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6919      +/-   ##
==========================================
+ Coverage   46.46%   46.51%   +0.05%     
==========================================
  Files         698      698              
  Lines       47824    47876      +52     
==========================================
+ Hits        22220    22269      +49     
  Misses      23933    23933              
- Partials     1671     1674       +3     
Flag Coverage Δ
unittests 46.51% <77.61%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zach593 zach593 force-pushed the asyncworker-pq branch 4 times, most recently from 1e32269 to ca97883 Compare November 13, 2025 01:25
@RainbowMango
Copy link
Member

As for the backlog of the priority: 0 queue, it shouldn’t be that large — I suspect there might be an issue here, and I’ll check it later.

Yes, that's suspicious. You just updated one resource template, right? Can we add a log before enqueuing the items? So that we can know what items are in the queue.

@zach593
Copy link
Contributor Author

zach593 commented Nov 23, 2025

It can be observed that while there was still a backlog in the queue, after manually modifying the corresponding object, the detector processed it very quickly.
As for the backlog of the priority: 0 queue, it shouldn’t be that large — I suspect there might be an issue here, and I’ll check it later.

Reason:

After creating the cluster using hack/local-up-karmada.sh, karmada-apiserver will contain some basic resources that support the cluster, such as CRDs and APIServices. These resources will enter the resourceDetector queue. However, when entering reconcile, they lack corresponding PP or CPP, resulting in an error and a second entry into the priority queue with a ratelimit option, this time with a priority of 0.

d.AddWaiting(objectKey)
return fmt.Errorf("no matched propagation policy")
}

The ratelimit option is implemented by calling When and adding readyAt.

after := o.After
if o.RateLimited {
rlAfter := w.rateLimiter.When(key)
if after == 0 || rlAfter < after {
after = rlAfter
}
}
var readyAt *time.Time
if after > 0 {
readyAt = ptr.To(w.now().Add(after))
w.metrics.retry()
}

As seen in the less implementation, objects with readyAt are placed at the back of the queue, even if they might have higher priority. (before #6929)

func less[T comparable](a, b *item[T]) bool {
if a.ReadyAt == nil && b.ReadyAt != nil {
return true
}
if b.ReadyAt == nil && a.ReadyAt != nil {
return false
}
if a.ReadyAt != nil && b.ReadyAt != nil && !a.ReadyAt.Equal(*b.ReadyAt) {
return a.ReadyAt.Before(*b.ReadyAt)
}
if a.Priority != b.Priority {
return a.Priority > b.Priority
}
return a.AddedCounter < b.AddedCounter
}

Therefore, although these items are placed at the end of the queue, their priority is 0. This results in a situation where, after modifying an object, it is immediately processed by the detector, but the queue metrics still appear to show a backlog of items with a priority of 0.

After #6929, this has already been fixed in the code shown below. After rebasing, the problem has not occurred again.

func less[T comparable](a, b *item[T]) bool {
if a.Priority != b.Priority {
return a.Priority > b.Priority
}
if a.ReadyAt == nil && b.ReadyAt != nil {
return true
}
if b.ReadyAt == nil && a.ReadyAt != nil {
return false
}
if a.ReadyAt != nil && b.ReadyAt != nil && !a.ReadyAt.Equal(*b.ReadyAt) {
return a.ReadyAt.Before(*b.ReadyAt)
}
return a.AddedCounter < b.AddedCounter
}

@zach593
Copy link
Contributor Author

zach593 commented Nov 23, 2025

Updated Test Report:

karmada-controller-manager configuration:

image

Add a time.Sleep(time.Second) before logging in the resourceDetector:

image

After adding 5,000 deployments and their propagation policies and waiting for synchronization to complete, restart the controller.

And the result:

image

@zach593
Copy link
Contributor Author

zach593 commented Nov 23, 2025

/retest

1 similar comment
@zach593
Copy link
Contributor Author

zach593 commented Nov 24, 2025

/retest

@RainbowMango
Copy link
Member

After #6929, this has already been fixed in the code shown below. After rebasing, the problem has not occurred again.

Nice!

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/gemini review

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 25, 2025
@RainbowMango RainbowMango added this to the v1.16 milestone Nov 25, 2025
@RainbowMango RainbowMango added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 25, 2025
@RainbowMango
Copy link
Member

>>> git ls-remote --get-url origin
Error: failed linting charts: failed processing charts

------------------------------------------------------------------------------------------------------------------------
 ✖︎ karmada => (version: "0.0.0", path: "charts/karmada") > failed validating maintainer "chaosi-zju": 503 Service Unavailable
------------------------------------------------------------------------------------------------------------------------
failed linting charts: failed processing charts
Error: Process completed with exit code 1.

/retest

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 25, 2025
@karmada-bot karmada-bot merged commit d7c6633 into karmada-io:master Nov 25, 2025
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants