Skip to content

refactor(flowcontrol): Adopt Composite FlowKey as Primary Identifier #1340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LukeAVanDrie
Copy link
Contributor

This PR refactors the flow control system to establish the composite types.FlowKey (ID + Priority) as the canonical, immutable identifier for a flow instance. This is a direct result of critical feedback and discussions on the open registry PR #1319 , which revealed a fundamental misalignment between the implementation's data model and the API's intended contract.

The previous model treated a flow's priority as a mutable attribute, which required a complex state machine for handling "graceful updates" (managing active vs. draining states, generation counters, etc.). This PR replaces that model with a simpler, more robust architecture where a flow instance is uniquely and immutably defined by its FlowKey.

This tracks #674


What this PR Does

  • Introduces types.FlowKey: A new struct combining ID and Priority is now the primary key for all flow instances.
  • Refactors Core Contracts: The contracts.RegistryShard and framework interfaces have been updated to use FlowKey as the primary identifier, removing ambiguity and clarifying the API. ActiveManagedQueue has been removed, and methods now consistently use the FlowKey.
  • Removes the Hooks for the Update State Machine: The preliminary hooks for the complex machinery for managing priority updates, including "draining" states, generation counters, and queue reactivation logic in [WIP] feat(flowcontrol): Implement the FlowRegistry #1319, has been deleted from ManagedQueue.
  • Updates the ShardProcessor: The controller's worker implementation has been updated to align with the new, simpler contracts.

Key Benefits

  1. State Simplification: By removing the complex priority update logic, the FlowRegistry and its components are now significantly simpler, more robust, and easier to reason about.
  2. Unambiguous API Contracts: Interfaces throughout the system are now clearer. Methods receive a single, self-contained FlowKey, eliminating the need to pass flowID and priority as separate parameters.
  3. Architectural Alignment: The implementation now correctly and directly reflects the agreed-upon domain model where a flow instance is uniquely defined by its ID and its priority.

Architectural Consequence & Accepted Trade-off

This PR knowingly accepts a significant architectural trade-off: inter-flow fairness is now strictly siloed within each priority band.

Under this new model, the system will never be able to reason about holistic fairness for a single logical tenant across multiple priority levels (e.g., balancing TenantA-High against TenantA-Medium). Fairness policies operate on (flowID, priority) pairs as distinct, unrelated entities.

This trade-off is considered acceptable as the requirement for holistic, cross-priority fairness is currently undefined and out of scope for the system's immediate goals. This change provides a simpler, more robust foundation to build upon.

Thank you to @kfswain and @ahg-g for the insightful feedback that led to this design compromise.

This commit establishes the composite `types.FlowKey` (ID + Priority) as
the canonical identifier for a flow instance throughout the entire flow
control system. This architectural shift aligns the implementation with
the clarified domain model where a flow's priority is immutable.

This change is the result of a key architectural decision to simplify
the system's state management. The previous model, which treated
priority as a mutable attribute, required a complex state machine to
handle graceful updates. By making the `FlowKey` immutable, that
complexity is no longer necessary.

The key benefits of this new model are:

1.  **Radical State Simplification:** The entire state machine for
    handling priority updates has been removed. The concepts of "active"
    vs. "draining" queues, generation counters, and queue reactivation
    logic are now obsolete. This makes the `FlowRegistry` and
    `ManagedQueue` implementations significantly simpler and more
    robust.

2.  **Unambiguous API Contracts:** Interfaces are now clearer. Methods
    like `ManagedQueue(key types.FlowKey)` receive a single,
    self-contained identifier, eliminating the need to pass `flowID` and
    `priority` as separate parameters and removing any ambiguity about a
    flow's identity.

3.  **Architectural Alignment:** The implementation now correctly and
    directly reflects the API's expectation that a flow instance is
    uniquely defined by its ID *and* its priority.

**Architectural Consequence & Accepted Trade-off:**

This commit knowingly accepts a significant trade-off: inter-flow
fairness is now strictly siloed within each priority band.

Under this model, the system will never be able to reason about holistic
fairness for a single tenant across multiple priority levels (e.g.,
balancing `TenantA-High` against `TenantA-Medium`). Fairness policies
now operate on `(flowID, priority)` pairs as distinct, unrelated
entities.

This trade-off is considered acceptable because the requirement for
holistic, cross-priority fairness is currently undefined and out of
scope for the system's immediate goals.
Copy link

netlify bot commented Aug 9, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 3ade2eb
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/689698a2aac24e000836fbce
😎 Deploy Preview https://deploy-preview-1340--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 9, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LukeAVanDrie
Once this PR has been reviewed and has the lgtm label, please assign arangogutierrez for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants