Skip to content

[WIP] feat(flowcontrol): Implement the FlowRegistry #1319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions pkg/epp/flowcontrol/contracts/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
Copyright 2025 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package contracts defines the service interfaces that decouple the core `controller.FlowController` engine from its
// primary dependencies. In alignment with a "Ports and Adapters" (or "Hexagonal") architectural style, these
// interfaces represent the "ports" through which the engine communicates with external systems and pluggable logic.
//
// The two primary contracts defined here are:
//
// - `FlowRegistry`: The interface for the stateful control plane that manages the lifecycle of all flows, queues, and
// policies.
//
// - `SaturationDetector`: The interface for a component that provides real-time load signals, allowing the dispatch
// engine to react to backend saturation.
package contracts
10 changes: 7 additions & 3 deletions pkg/epp/flowcontrol/contracts/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,20 @@ import "errors"

// Registry Errors
var (
// ErrFlowInstanceNotFound indicates that a requested flow instance (a `ManagedQueue`) does not exist in the registry
// shard, either because the flow is not registered or the specific instance (e.g., a draining queue at a particular
// priority) is not present.
// ErrFlowInstanceNotFound indicates that a requested flow instance (a `ManagedQueue`) does not exist.
ErrFlowInstanceNotFound = errors.New("flow instance not found")

// ErrFlowIDEmpty indicates that a flow specification was provided with an empty flow ID.
ErrFlowIDEmpty = errors.New("flow ID cannot be empty")

// ErrPriorityBandNotFound indicates that a requested priority band does not exist in the registry because it was not
// part of the initial configuration.
ErrPriorityBandNotFound = errors.New("priority band not found")

// ErrPolicyQueueIncompatible indicates that a selected policy is not compatible with the capabilities of the queue it
// is intended to operate on. For example, a policy requiring priority-based peeking is used with a simple FIFO queue.
ErrPolicyQueueIncompatible = errors.New("policy is not compatible with queue capabilities")

// ErrInvalidShardCount indicates that an invalid shard count was provided (e.g., zero).
ErrInvalidShardCount = errors.New("invalid shard count")
)
170 changes: 156 additions & 14 deletions pkg/epp/flowcontrol/contracts/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,147 @@ See the License for the specific language governing permissions and
limitations under the License.
*/

// Package contracts defines the service interfaces that decouple the core `controller.FlowController` engine from its
// primary dependencies. In alignment with a "Ports and Adapters" (or "Hexagonal") architectural style, these
// interfaces represent the "ports" through which the engine communicates.
//
// This package contains the primary service contracts for the Flow Registry, which acts as the control plane for all
// flow state and configuration.
package contracts

import (
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/flowcontrol/framework"
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/flowcontrol/types"
)

// FlowRegistry is the complete interface for the global control plane, composed of administrative functions and the
// ability to provide shard accessors. A concrete implementation of this interface is the single source of truth for all
// flow control state and configuration.
//
// # Conformance
//
// All methods defined in this interface (including those embedded) MUST be goroutine-safe.
// Implementations are expected to perform complex updates (e.g., `RegisterOrUpdateFlow`, `UpdateShardCount`) atomically
// to preserve system invariants.
//
// # Invariants
//
// Concrete implementations of FlowRegistry MUST uphold the following invariants across all operations:
// 1. Shard Consistency: All configured priority bands and logical flows must be represented on every Active internal
// shard. Plugin instance types (e.g., the specific `framework.SafeQueue` implementation or policy plugins) must be
// consistent for a given flow or band across all shards.
// 2. Flow Instance Uniqueness per Band: For any given logical flow, there can be a maximum of one `ManagedQueue`
// instance per priority band. An instance can be either Active or Draining.
// 3. Single Active Instance per Flow: For any given logical flow, there can be a maximum of one Active `ManagedQueue`
// instance across all priority bands. All other instances for that flow must be in a Draining state.
// 4. Capacity Partitioning Consistency: Global and per-band capacity limits are uniformly partitioned across all
// active shards. The sum of the capacity limits allocated to each shard must not exceed the globally configured
// limits.
//
// # Flow Lifecycle States
//
// - Registered: A logical flow is Registered when it is known to the `FlowRegistry`. It has exactly one Active
// instance across all priority bands and zero or more Draining instances.
// - Active: A specific instance of a flow within a priority band is Active if it is the designated target for all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can only a single flow within a priority band accept new enqueues? This reads like if 2 'Critical' requests came in, from different 'flow' identities, then they would both end up in the same flow

// new enqueues for that logical flow.
// - Draining: A flow instance is Draining if it no longer accepts new enqueues but still contains items that are
// eligible for dispatch. This occurs after a priority change.
// - Garbage Collected (Unregistered): A logical flow is automatically unregistered and garbage collected by the
// system when it has been 'idle' for a configurable period. A flow is considered idle if its active queue instance
// has been empty on all active shards for the timeout duration. Once unregistered, it has no active instances,
// though draining instances from previous priority levels may still exist until their queues are also empty.
//
// # Shard Garbage Collection
//
// When a shard is decommissioned via `UpdateShardCount`, the `FlowRegistry` must ensure a graceful shutdown. It must
// mark the shard as inactive to prevent new enqueues, allow the `FlowController` to continue draining its queues, and
// only delete the shard's state after the associated worker has fully terminated and all queues are empty.
type FlowRegistry interface {
FlowRegistryAdmin
ShardProvider
}

// FlowRegistryAdmin defines the administrative interface for the global control plane. This interface is intended for
// external systems to configure flows, manage system parallelism, and query aggregated statistics for observability.
//
// # Design Rationale for Dynamic Update Strategies
//
// The `FlowRegistryAdmin` contract specifies precise behaviors for handling dynamic updates. These strategies were
// chosen to prioritize system stability, correctness, and minimal disruption:
//
// - Graceful Draining (for Priority/Shard Lifecycle Changes): For operations that change a flow's priority or
// decommission a shard, the affected queue instances are marked as inactive but are not immediately deleted. They
// enter a Draining state where they no longer accept new requests but are still processed for dispatch. This
// ensures that requests already accepted by the system are processed to completion. Crucially, requests in a
// draining queue continue to be dispatched according to the priority level and policies they were enqueued with,
// ensuring consistency.
//
// - Atomic Queue Migration (Future Design for Incompatible Intra-Flow Policy Changes): When an intra-flow policy is
// updated to one that is incompatible with the existing queue data structure, the designed future behavior is a
// full "drain and re-enqueue" migration. This more disruptive operation is necessary to guarantee correctness. A
// simpler "graceful drain"—by creating a second instance of the same flow in the same priority band—is not used
// because it would violate the system's "one flow instance per band" invariant. This invariant is critical because
// it ensures that inter-flow policies operate on a clean set of distinct flows, stateful intra-flow policies have a
// single authoritative view of their flow's state, and lookups are unambiguous. Note: This atomic migration is a
// future design consideration and is not implemented in the current version.
//
// - Self-Balancing on Shard Scale-Up: When new shards are added via `UpdateShardCount`, the framework relies on the
// `FlowController`'s request distribution logic (e.g., a "Join the Shortest Queue by Bytes (JSQ-Bytes)" strategy)
// to naturally funnel *new* requests to the less-loaded shards. This design choice strategically avoids the
// complexity of actively migrating or rebalancing existing items that are already queued on other shards, promoting
// system stability during scaling events.
type FlowRegistryAdmin interface {
// RegisterOrUpdateFlow handles the registration of a new flow or the update of an existing flow's specification.
// This method orchestrates complex state transitions atomically across all managed shards.
//
// # Dynamic Update Behaviors
//
// - Priority Changes: If a flow's priority level changes, its current active `ManagedQueue` instance is marked
// as inactive to drain existing requests. A new instance is activated at the new priority level. If a flow is
// updated to a priority level where an instance is already draining (e.g., during a rapid rollback), that
// draining instance is re-activated.
//
// # Returns
//
// - nil on success.
// - An error wrapping `ErrFlowIDEmpty` if `spec.ID` is empty.
// - An error wrapping`ErrPriorityBandNotFound` if `spec.Priority` refers to an unconfigured priority level.
// - Other errors if internal creation/activation of policy or queue instances fail.
RegisterOrUpdateFlow(spec types.FlowSpecification) error

// UpdateShardCount dynamically adjusts the number of internal state shards, triggering a state rebalance.
//
// # Dynamic Update Behaviors
//
// - On Increase: New, empty state shards are initialized with all registered flows. The
// `controller.FlowController`'s request distribution logic will naturally balance load to these new shards over
// time.
// - On Decrease: A specified number of existing shards are marked as inactive. They stop accepting new requests
// but continue to drain existing items. They are fully removed only after their queues are empty.
//
// The implementation MUST atomically re-partition capacity allocations across all active shards when the count
// changes.
UpdateShardCount(n uint) error

// Stats returns globally aggregated statistics for the entire `FlowRegistry`.
Stats() AggregateStats

// ShardStats returns a slice of statistics, one for each internal shard. This provides visibility for debugging and
// monitoring per-shard behavior (e.g., identifying hot or stuck shards).
ShardStats() []ShardStats
}

// ShardProvider defines a minimal interface for consumers that need to discover and iterate over available shards.
//
// A "shard" is an internal, parallel execution unit that allows the `FlowController`'s core dispatch logic to be
// parallelized. Consumers of this interface, such as a request distributor, MUST check `RegistryShard.IsActive()`
// before routing new work to a shard to ensure they do not send requests to a shard that is gracefully draining.
type ShardProvider interface {
// Shards returns a slice of accessors, one for each internal state shard.
//
// A "shard" is an internal, parallel execution unit that allows the `controller.FlowController`'s core dispatch logic
// to be parallelized, preventing a CPU bottleneck at high request rates. The `FlowRegistry`'s state is sharded to
// support this parallelism by reducing lock contention.
//
// The returned slice includes accessors for both active and draining shards. Consumers MUST use `IsActive()` to
// determine if new work should be routed to a shard. Callers should not modify the returned slice.
Shards() []RegistryShard
}

// RegistryShard defines the read-oriented interface that a `controller.FlowController` worker uses to access its
// specific slice (shard) of the `FlowRegistry`'s state. It provides the necessary methods for a worker to perform its
// dispatch operations by accessing queues and policies in a concurrent-safe manner.
Expand Down Expand Up @@ -80,14 +209,14 @@ type RegistryShard interface {
}

// ManagedQueue defines the interface for a flow's queue instance on a specific shard.
// It wraps an underlying `framework.SafeQueue`, augmenting it with lifecycle validation against the `FlowRegistry` and
// integrating atomic statistics updates.
// It acts as a stateful decorator around an underlying `framework.SafeQueue`, augmenting it with lifecycle validation
// against the `FlowRegistry` and integrating atomic statistics updates.
//
// # Conformance
//
// - All methods (including those embedded from `framework.SafeQueue`) MUST be goroutine-safe.
// - The `Add()` method MUST reject new items if the queue has been marked as "draining" by the `FlowRegistry`,
// ensuring that lifecycle changes are respected even by consumers holding a stale pointer to the queue.
// - All methods defined by this interface and the `framework.SafeQueue` it wraps MUST be goroutine-safe.
// - The `Add()` method MUST reject new items if the queue has been marked as Draining by the `FlowRegistry`, ensuring
// that lifecycle changes are respected even by consumers holding a stale pointer to the queue.
// - All mutating methods (`Add()`, `Remove()`, `Cleanup()`, `Drain()`) MUST atomically update relevant statistics
// (e.g., queue length, byte size).
type ManagedQueue interface {
Expand All @@ -100,6 +229,18 @@ type ManagedQueue interface {
FlowQueueAccessor() framework.FlowQueueAccessor
}

// AggregateStats holds globally aggregated statistics for the entire `FlowRegistry`.
type AggregateStats struct {
// TotalCapacityBytes is the globally configured maximum total byte size limit across all priority bands and shards.
TotalCapacityBytes uint64
// TotalByteSize is the total byte size of all items currently queued across the entire system.
TotalByteSize uint64
// TotalLen is the total number of items currently queued across the entire system.
TotalLen uint64
// PerPriorityBandStats maps each configured priority level to its globally aggregated statistics.
PerPriorityBandStats map[uint]PriorityBandStats
}

// ShardStats holds statistics for a single internal shard within the `FlowRegistry`.
type ShardStats struct {
// TotalCapacityBytes is the optional, maximum total byte size limit aggregated across all priority bands within this
Expand All @@ -112,6 +253,7 @@ type ShardStats struct {
// TotalLen is the total number of items currently queued across all priority bands within this shard.
TotalLen uint64
// PerPriorityBandStats maps each configured priority level to its statistics within this shard.
// The capacity values within represent this shard's partition of the global band capacity.
// The key is the numerical priority level.
// All configured priority levels are guaranteed to be represented.
PerPriorityBandStats map[uint]PriorityBandStats
Expand All @@ -138,9 +280,9 @@ type PriorityBandStats struct {
Priority uint
// PriorityName is an optional, human-readable name for the priority level (e.g., "Critical", "Sheddable").
PriorityName string
// CapacityBytes is the configured maximum total byte size for this priority band, aggregated across all items in
// all flow queues within this band. If scoped to a shard, its value represents the configured band limit for the
// `FlowRegistry` partitioned for this shard.
// CapacityBytes is the configured maximum total byte size for this priority band.
// When viewed via `AggregateStats`, this is the global limit. When viewed via `ShardStats`, this is the partitioned
// value for that specific shard.
// The `controller.FlowController` enforces this limit.
// A default non-zero value is guaranteed if not configured.
CapacityBytes uint64
Expand Down
Loading