-
Notifications
You must be signed in to change notification settings - Fork 1.3k
🐛 fix priority queue ordering when item priority changes #3408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zach593 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @zach593. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
| } | ||
| }) | ||
|
|
||
| It("follows FIFO order in the new priority queue when item priority changes", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You write items should be ordered by priority first, and by enqueue time (FIFO) within the same priority but you don't provide any justification for that. We generally take the "best" of all Adds we've over observed for a given key, where "best" means whichever gets the item to the front of the queue fastest: Highest priority, lowest after. Why should the queue behave differently when it comes to the addedCounter?
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The most obvious benefit of the priority queue feature is to optimize controller startup by deferring low-priority events, thereby reducing externally perceived downtime. Based on this scenario, consider the following example.
Assume there is a single worker and 26 items, from A to Z. After the initial list completes, all items are enqueued into the priority queue in alphabetical order. While A is being processed, Z receives an update event, followed by Y, and then X. As a result, these items are moved into the high-priority queue, but the ordering becomes:
High priority: X, Y, Z
Low priority: B, C, D, …, W
This behavior is unintuitive. Although the items trigger update events in the order Z, then Y, then X, the actual processing order is still dominated by the original alphabetical enqueue order rather than the update time.
From an external observer’s perspective, items in the low-priority queue are often ones that can reasonably be ignored, while the items that truly deserve attention are those that have triggered update events. However, for these important items, the controller does not behave in a FIFO manner. This is particularly surprising given that many users in the Kubernetes ecosystem are aware that controllers are internally implemented as queues and therefore naturally expect FIFO semantics.
Additionally, once the queue (including the low-priority queue) has been completely drained, and we later observe the same update sequence again, the ordering in queue becomes Z, Y, X, which is actually the expected FIFO order. However, this makes the controller’s behavior appear inconsistent to external observers, because the ordering differs significantly from the earlier case where low-priority items were still present in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stefan and I had a longer discussion around this and arrived at the conclusion that the change makes sense, because it unbreaks the use-case of "While items from initial list are being processed, incoming update events are processed in fifo order" (which I think you tried to allude to, but I didn't understand initially). This is currently broken, since the items in the initial list are not ordered by anything useful.
For consistency, one could argue it would also make sense to do the same for requeueAfter, but that would break the current "Resync resets after" and any other code that wants to lower after but doesn't know the current priority so we leave it as-is for the moment but can revisit later if someone comes up with a good reason.
Could you please add a go doc about this change and the motivating use-case somewhere?
/hold cancel
Signed-off-by: zach593 <[email protected]>
This PR fixes an ordering issue in the priority queue: items should be ordered by priority first, and by enqueue time (FIFO) within the same priority. When an item’s priority changes, the queue previously didn’t consistently preserve this rule. The PR corrects the behavior and adds corresponding test case.
Note: @alvaroaleman may want to review the semantic change to
addedCounter.