feat: Add top-level Flow Controller #1525

LukeAVanDrie · 2025-09-04T01:48:24Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces the FlowController, a high-throughput, sharded supervisor that acts as the central processing engine for the Flow Control system. It completes the work for the Flow Control MVP (sans wiring into the request lifecycle) by adding the top-level request distributor.

The controller implements a supervisor-worker pattern, managing the lifecycle of stateful ShardProcessor workers. It distributes incoming requests using a flow-aware "Join-Shortest-Queue-by-Bytes" (JSQ-Bytes) algorithm to balance load across workers and promote emergent fairness.

Guided Review:

To make this large but atomic change easier to review, it has been organized into four distinct, logical commits. It is highly recommended to review this PR commit-by-commit.

eb049100: feat(flowcontrol): Refactor FlowRegistry contracts
- This preparatory commit refactors core interfaces for clarity (e.g., FlowRegistryClient -> FlowRegistryDataPlane) and makes the ActiveFlowConnection contract more explicit by renaming Shards() to ActiveShards().
d4b18535: refactor: Adapt ShardProcessor to a worker role
- This refactors the ShardProcessor into a stateful worker. The public API is changed from a direct Enqueue method to a channel-based Submit/SubmitOrBlock model to decouple it from the producer and enable better backpressure signaling.
99dedfb1: feat: Introduce the FlowController supervisor
- This is the core of the feature, adding the FlowController itself. It includes the worker lifecycle management, a reconciliation loop to garbage-collect stale workers, and the JSQ-Bytes distribution logic.
59e04f75: docs: Update comments to align with FlowController
- This final commit updates documentation across the framework to explain the architectural implications of the new supervisor, such as the distinction between "logical" and "physical" enqueue time that arises from the controller's retry logic.

Which issue(s) this PR fixes:

Tracks #674

Not fixed until final wiring and benchmarking. Consequently, I will add the release note in the wiring PR.

Does this PR introduce a user-facing change?:

NONE

cc: @rahulgurnani

netlify · 2025-09-04T01:48:29Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`16bc129`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68c4a45d1861a000084969c7
😎 Deploy Preview	https://deploy-preview-1525--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-09-04T01:48:34Z

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pkg/epp/flowcontrol/controller/internal/processor.go

pkg/epp/flowcontrol/controller/internal/processor_test.go

pkg/epp/flowcontrol/controller/controller.go

pkg/epp/flowcontrol/controller/doc.go

LukeAVanDrie · 2025-09-04T01:58:24Z

/assign @kfswain

I broke this into several commits for easier review. PTAL at the reviewer guide. I think this change is atomic, but I am also happy to split each of these out into separate PRs if you would prefer.

k8s-ci-robot · 2025-09-04T01:58:52Z

@LukeAVanDrie: GitHub didn't allow me to request PR reviews from the following users: rahulgurnani.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @rahulgurnani

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LukeAVanDrie · 2025-09-04T02:01:48Z

This is the final functional change for the Flow Control layer. Remaining work involves

wiring (which is quite simple as this layer has a very lean public API surface) including allowing configuration with our text-based config
logging and error wrapping audit
doc refresh & diagrams
benchmarking and verification and addressing bugs / perf issues as necessary

ahg-g · 2025-09-09T20:17:35Z

/ok-to-test

kfswain

Got through the new files, left some comments. Will continue the PR in the AM

kfswain · 2025-09-10T02:05:32Z

pkg/epp/flowcontrol/controller/config.go

+	// processor. This buffer acts as a shock absorber, decoupling the high-frequency distributor from the processor's
+	// serial execution loop and allowing the system to handle short bursts of traffic without blocking.
+	// Optional: Defaults to `defaultEnqueueChannelBufferSize` (100).
+	EnqueueChannelBufferSize int


What happens if this buffer overflows?

If running with data parallelism (more than one shard), the distribution algorithm will select the best candidate and attempt a non-blocking send. If the candidate buffer is full, it immediately falls back to the next best candidate and so on. If every worker's buffer is full, then we change from a non-blocking to a blocking send on the best candidate.

As the type comment mentions, it is simply a shock absorber for transient request bursts. The goal is to get the request enqueued (in the actual queue structure) as quickly as possible. If the buffer is full, it's fine, we just try to find a less congested worker even if it is not the "best" candidate for our data parallelism strategy.

@kfswain Hopefully this clears it up. If you don't find the config documentation sufficient here, I can update it to better incorporate these details. Else, if no change is needed, can you resolve this thread?

pkg/epp/flowcontrol/controller/config.go

kfswain · 2025-09-10T02:08:16Z

pkg/epp/flowcontrol/controller/config.go

+
+// newConfig performs validation and initialization, returning a guaranteed-valid `Config` object.
+// This is the required constructor for creating a new configuration.
+// It does not mutate the input `cfg`.


OOC: Why not? Can we just log what the value was before it was defaulted?

// It does not mutate the input cfg.

We could do this. I prefer this being a pure function though. It is easier to test and less prone to unexpected side effects.

Also, I am changing this in a followup PR and removing the requirement for this constructor to be invoked (this becomes test-only convenience utility). In registry/config.co and controller/config.go I will be exposing a Config.ValidateAndApplyDefaults() *Config method that the caller is expected to invoke before passing the config structs to the FlowRegistry and FlowController respectively.

Would you like me to absorb this refactoring (for at least the controller package into this PR) instead of a followup?

To keep this PR focused squarely on the controller implementation itself, I'd prefer to tackle the config changes in the immediate follow-up PR I have staged.

In that PR, I implement the plan I mentioned: removing the newConfig constructor in favor of exposing a public ValidateAndApplyDefaults() method on the Config struct itself. This will make the EPP runner responsible for calling it, which is a cleaner, more explicit approach.

Does that sound like a reasonable path forward? If so, I'll resolve this thread and proceed with the follow-up PR after this one merges.

pkg/epp/flowcontrol/controller/controller.go

pkg/epp/flowcontrol/controller/doc.go

pkg/epp/flowcontrol/controller/internal/processor.go

kfswain · 2025-09-10T22:22:25Z

Mostly looks good, left some smaller comments. Would love to see diagrams for this and see it workin in action

ahg-g · 2025-09-12T15:33:43Z

pkg/epp/flowcontrol/controller/controller.go

+//     backpressure to the caller.
+func (fc *FlowController) EnqueueAndWait(req types.FlowControlRequest) (types.QueueOutcome, error) {
+	if req == nil {
+		return types.QueueOutcomeRejectedOther, fmt.Errorf("%w: %w", types.ErrRejected, types.ErrNilRequest)


rejected doesn't look correct here, shouldn't this be something like an "internal error"?

Yes, internal is also appropriate. I use two high level sentinel errors to represent pre-enqueue errors (rejection) vs post-enqueue errors (evicted). In this case though, it is clearly bad user input. I can probably also remove the ErrNilRequest sentinel error as I cannot imagine a caller will ever switch on that.

The other sentinel errors are actually useful for mapping to the proper API error at the caller.

I updated the error handling for a nil request based on the feedback that treating it as a "rejection" was semantically confusing. A nil request is a programmatic/caller error, not an operational rejection by the queueing system.

The current implementation now returns a direct, non-sentinel error: errors.New("request cannot be nil").

This leads to a point of design friction I'd like to discuss:

Original Intent: The system was designed with two high-level sentinel errors to categorize failures: ErrRejected for pre-enqueue issues (like capacity) and ErrEvicted for post-enqueue issues (like TTL expiry). The QueueOutcome enum was meant to provide granular labels for metrics, with each Rejected or Evicted outcome paired with its corresponding high-level error.

Current State: By removing the ErrRejected wrapper for the nil request case, we've made the error itself more semantically correct. However, we are still returning QueueOutcomeRejectedOther. This slightly breaks the clean pairing, as we now have a "Rejected" outcome that isn't associated with an ErrRejected error.
This is a reasonable compromise, but a bit confusing.

pkg/epp/flowcontrol/controller/controller.go

pkg/epp/flowcontrol/controller/internal/item.go

This commit refactors some of the core Flow Control contracts to improve clarity and better align with their intended roles. The goal is to create a more intuitive and robust interface for the upcoming top-level FlowController. Key changes include: - The `FlowRegistryClient` interface is renamed to `FlowRegistryDataPlane` to more accurately reflect its role in the high-throughput request path. - The `FlowRegistryAdmin` interface is renamed to `FlowRegistryObserver` to clarify its read-only, observational nature. - The `ActiveFlowConnection.Shards()` method is renamed to `ActiveFlowConnection.ActiveShards()` to make it explicit that it returns only active, schedulable shards. This removes ambiguity for the distributor logic. - `ShardStats` is enriched with `ID` and `IsActive` fields, providing consumers with more context about the shard's state at the time the snapshot was taken. - The registry implementation has been updated to match these new contract definitions.

This commit refactors the `ShardProcessor` to function as a stateful worker managed by a higher-level supervisor. This is a preparatory step for the introduction of the new top-level `FlowController`. The public API of the processor is changed from a direct `Enqueue` method to a more sophisticated, channel-based submission model with `Submit` (non-blocking) and `SubmitOrBlock` (blocking). This decouples the producer from the processor's main loop, enabling better backpressure signals and higher throughput. Key changes include: - Introduction of `Submit` and `SubmitOrBlock` for asynchronous request handoff. - `FlowItem`'s finalization logic is improved to be more robust and channel-based. - Error handling within the dispatch cycle is refactored (no logic change) to be more clear about how it promotes work conservation by isolating failures to a single priority band.

This commit introduces the `FlowController`, a high-throughput, sharded supervisor that orchestrates a pool of stateful `ShardProcessor` workers. This new component is the central processing engine of the Flow Control system, implementing a "supervisor-worker" pattern. Key features of the `FlowController` include: - Supervisor-Worker Architecture: Acts as a stateless supervisor, managing the lifecycle of stateful `ShardProcessor` workers. It includes a reconciliation loop to garbage-collect workers for stale shards. - Flow-Aware Load Balancing: Implements a "Join-Shortest-Queue-by-Bytes" (JSQ-Bytes) algorithm to distribute incoming requests to the least-loaded worker, promoting emergent fairness. - Synchronous API: Exposes a blocking `EnqueueAndWait` method, which simplifies client integration (e.g., with Envoy `ext_proc`) and provides direct backpressure. - Lazy Worker Initialization: Workers are created on-demand when a shard shard first becomes active to conserve resources and reduce contention on the hot path. - Configuration: A new `Config` object allows for tuning parameters like TTLs, buffer sizes, and reconciliation intervals.

This commit updates documentation and code comments across various framework components to align with the concepts and architecture introduced by the `FlowController`. Key changes include: - FCFS Policy: Clarified the distinction between "logical" and "physical" enqueue time and the behavioral trade-offs when pairing with different queue capabilities. - ListQueue: Expanded the documentation to explain its role as a high-performance, approximate FCFS queue in the context of the `FlowController`'s retry mechanics. - Request Types: Refined the comments for `QueueItemAccessor` to be more precise about the meaning of `EnqueueTime`.

This commit refactors the `FlowController` to simplify its startup and shutdown lifecycle, making it more robust and easier to reason about. It also incorporates several smaller improvements based on reviewer feedback. The primary change addresses a complex lifecycle implementation that used an `atomic.Bool` (`isRunning`) and a `ready` channel to manage state. Key changes: - **Simplified Lifecycle:** The controller's lifecycle is now tied directly to a `context` passed into `NewFlowController`. The `Run` method has been unexported, and the main `run` loop is started as a goroutine from the constructor. This eliminates the `ready` channel and `isRunning` flag in addition to simplifying the interface for callers. - **Robust Worker Creation:** The `getOrStartWorker` logic has been improved to ensure that in a race to create a worker, the "losing" goroutine correctly cleans up its resources and does not start a redundant processor. This fixes a bug where the losing worker would evict all items from its queues on shutdown which were shared instances with the winning worker resulting in premature request finalization. - **Comment Reduction:** The extensive explanatory comments in `distributeRequest` have been condensed to be more concise while retaining the essential details of the algorithm. - **Minor Cleanups:** - The initial, unnecessary call to `reconcileProcessors()` at startup has been removed. - Error messages have been clarified (e.g., "acquire lease" instead of "establish connection"). - A typed error for nil requests was replaced with a standard `errors.New`.

ahg-g · 2025-09-12T23:06:16Z

/lgtm
/approve

k8s-ci-robot · 2025-09-12T23:06:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 4, 2025

k8s-ci-robot requested review from nirrozenbaum and robscott September 4, 2025 01:48

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 4, 2025

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 4, 2025