-
Notifications
You must be signed in to change notification settings - Fork 135
v1.0 InferencePool API Review #1173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.5
Are you sure you want to change the base?
v1.0 InferencePool API Review #1173
Conversation
…-sigs#1160) Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.15.0 to 0.16.0. - [Commits](golang/sync@v0.15.0...v0.16.0) --- updated-dependencies: - dependency-name: golang.org/x/sync dependency-version: 0.16.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This commit introduces a new pluggable framework for request queuing within the EPP Flow Control layer. This change establishes the core interfaces and initial implementations needed for sophisticated request management, prioritization, and fairness. The key components of this framework are: - **`framework.SafeQueue` Interface**: A new contract for concurrent-safe queue implementations. It defines a standard set of behaviors for adding, removing, peeking, and managing items, ensuring that all queue plugins are interchangeable. - **Queue Plugin Implementations**: - **`listqueue`**: A simple, efficient FIFO queue based on `container/list`. Ideal for basic, fair queuing workloads. - **`maxminheap`**: A priority queue based on a max-min heap, allowing for O(1) access to both the highest and lowest priority items. This is suitable for advanced policies that require configurable ordering. - **Plugin Registration**: A factory pattern (`queue.MustRegisterQueue`) allows new queue implementations to be discovered and registered at runtime, making the system extensible. - **Comprehensive Testing**: - A new conformance test suite (`TestQueueConformance`) ensures that all registered queue plugins strictly adhere to the `SafeQueue` contract, covering lifecycle, ordering, edge cases, and concurrency. - A centralized benchmark suite (`BenchmarkQueues`) provides a fair, apples-to-apples performance comparison of all queue implementations across various workload patterns. - **Core Type Refinements**: The `types` package has been updated to support this new framework, including a refined `QueueItemAccessor` interface and a new `QueueItemHandle` for opaque, safe item manipulation. This framework decouples the core flow control logic from the specific queuing disciplines, enabling future work on advanced dispatch and displacement policies.
…gs#1157) Signed-off-by: Nir Rozenbaum <[email protected]>
* Conformance: Fixes the EPP ConfigMap namespace Signed-off-by: Daneyon Hansen <[email protected]> * Renames config file in rollout.md Signed-off-by: Daneyon Hansen <[email protected]> --------- Signed-off-by: Daneyon Hansen <[email protected]>
This commit introduces the `IntraFlowDispatchPolicy` framework, the second major component of the new pluggable flow control system. This framework decouples the logic for selecting a request from within a single flow's queue (temporal scheduling) from the underlying queue data structure. Key components include: - `framework.IntraFlowDispatchPolicy`: The core interface that defines the contract for selecting an item from a flow's queue. - `framework.FlowQueueAccessor`: A read-only interface that provides policies with safe access to queue state. - `RequiredQueueCapabilities`: A mechanism for policies to declare their queue requirements (e.g., FIFO, priority-ordered), which are validated by the registry. - A factory and registration system for discovering and instantiating policy plugins by name. - A comprehensive conformance test suite to validate the contract for all policy plugins. - A foundational `FCFS` (First-Come, First-Served) policy as the first reference implementation. This work builds directly on the `SafeQueue` framework, enabling the development of sophisticated, policy-driven request prioritization and scheduling.
Hi @capri-xiyue. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/hold |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
5317094
to
4ffb5f6
Compare
/hold, this should not get merged |
77371fe
to
4ffb5f6
Compare
why is it pushed to release 0.5 branch? |
This is not meant to merge, just for api review. As #1116 get merged, If I point it to main, there won't be any difference. |
// EndpointPickerConfig specifies the configuration needed by the proxy to discover and connect to the endpoint | ||
// picker service that picks endpoints for the requests routed to this pool. | ||
EndpointPickerConfig `json:",inline"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed during today's community meeting, we need to decide on refining the EndpointPickerConfig type. We originally chose this API structure to support surfacing config for future extensions, and inlining to simplify the UI. From my understanding, we cannot change the inline after the API goes GA, so we should either simplify the EPP config API surface or remove inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are removing this struct, we may want to rename extensionRef to make it explicit that this is an epp extension; also, do we want to make extensionRef a list (and add a type enum, with the only value possible now EPP) to allow potential expansion to other pool attached extensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot change the inline after the API goes GA
As long as the user-facing API surface remains the same (the CRD in this case), we can make any changes we want to the underlying Go types. Although we generally try to modify changes to underlying Go types, the only compatibility guarantees with k8s APIs are to the CRDs/API specs themselves.
I think the actual API surface here is ~fine as is, but I don't mind removing the extra struct layer as we can always add it back later if/when we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in community meeting. We will try to clean it up before 1.0 release. But it's not a blocker. If we can't get it done before 1.0 release, we need to create a issue for it and include it in future milestone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the PR for the clean up #1324
// Cross namesoace selector is not supported. | ||
// | ||
// +kubebuilder:validation:Required | ||
Selector map[LabelKey]LabelValue `json:"selector"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related issue where there seems to be some use cases for using the full selector feature because the map fall short kubernetes/kubernetes#48528, just for reference since it is not going to be possible to evolve later, but I imagine that you have already discussed this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an intentional omission as many implementations are stuck creating a Service behind the scenes to get EndpointSlices, so we want the selector that's specified here to be able to map to Service as long as that's a common implementation detail (I'd hope this would be temporary). It could be worth trying to find a way to structure this field in a way that allows for a more fully featured selector in the future though. I think that would mean copying the upstream LabelSelector type and temporarily omitting matchExpressions
.
Curious what maintainers think here:
cc @danehans @nirrozenbaum @kfswain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a draft PR on this: #1330 . In it I simply 1. turn the map[LabelKey]LabelValue
to a LabelSelector
2. let the LabelSelector
struct contains the map[LabelKey]LabelValue
.
I'm assuming we only want to do it in v1 because this is also a breaking change.
xref #1267 on whether |
InferencePool |
Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: capri-xiyue The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
xref #735 (comment) regarding how to communicate EPP cert info to the Gateway implementation. |
What type of PR is this?
/kind api-change
What this PR does / why we need it:
This PR is a diff of /apis from alpha (main branch) to v1.0 (release-1.0 branch). The InferencePool SPEC doesn't have any change except the group change from
inference.networking.x-k8s.io
to `inference.networking.k8s.ioNote: This PR is purely to facilitate review, it is not intended to merge.
To do the api review, please select the specific commit as the screenshot below so that you can just review the api related change

/assign @robscott