-
Notifications
You must be signed in to change notification settings - Fork 5.3k
add WG AI Gateway #8521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add WG AI Gateway #8521
Conversation
@shaneutt: GitHub didn't allow me to request PR reviews from the following users: david-martin, kflynn, rootfs, yuzisun. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shaneutt The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
37481e4
to
84d372b
Compare
/hold |
|
||
## Deliverables | ||
|
||
* A compendium of AI related networking definitions (e.g. "AI Gateway") and a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this some sort of stored artifact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Documentation somewhere with some definitions, where exactly TBD.
9c05bc4
to
ea77fe1
Compare
@aojea a kind reminder... |
|
||
The AI Gateway Working Group focuses on the intersection of AI and | ||
networking, particularly in the context of extending load-balancer, gateway | ||
and proxy technologies to manage and route traffic for AI Inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m wondering if this WG has some overlap with WG Serving.
community/wg-serving/charter.md
Lines 37 to 38 in bfee8f7
- Explore new projects that improve orchestration, scaling, and load balancing | |
of inference workloads and compose well with other workloads on Kubernetes |
Could you help clarify the scope differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it's squarely focused on the networking aspects of inference, not the compute or lifecycle management aspects as in the case of wg-serving. The key point is this one:
manage and route traffic for AI Inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but why the existing WG Serving isn't the right place to cover those topics? The biggest issue as I'm seeing is that you will affect WG Serving work, only if focusing on the networking side of it. So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the existing WG Serving isn't the right place to cover those topics?
WG Serving's charter makes it clear that it is focused broadly on serving workloads as the primary objective, and its goals speak directly to that. Notably, the stated goals do not include any networking specific deliverables.
This WG is focused very tightly on traffic and API management, going as far as focusing on very specific individual features in that domain that we want to explore (see the document from the description).
While it may seem plausible for any working group to claim that it is the suitable forum for networking-related discussions, this perspective does not hold when we move beyond standard networking to address protocol or domain-specific networking, such as in use cases like this. In these instances, it is essential to engage specialists from the community and provide room for dedicated focus. It is because of this technical specificity, the need to engage more people and grow our community and the need for autonomy and focus that WG Serving is not the right place to cover these topics.
Additionally, one of our primary use cases (see the "Why?" section of our originating document) covers the situation where users want to perform inference from their Kubernetes applications, but they will be reaching outside the cluster to do it. This use case is effectively an egress use case, and is definitively networking, and is entirely out of scope for WG Serving.
So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.
I've added WG Serving as an explicit collaborator which we will need to keep looped in for review on any of our proposals that deal explicitly with model serving backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This use case is effectively an egress networking use case and is entirely out of scope for WG Serving.
+1 that. I think WG Serving will remain a collaborator for any work that touches model-serving backends, but the proposed effort focuses on egress networking for inference from Kubernetes apps, which is out of its scope. The proposal looks promising with SIG MultiCluster for cross-cluster traffic policy and failover, SIG Apps for app-facing APIs and workload integration, and SIG Auth for identity, authentication, and policy on egress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the above: serving is just one piece of the AI networking story; those users have their own models and are doing local inference. IMO, far more users are going to want to consume remote models and apply policy on that consumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree with the above comments, there are various topics in the scope of network traffic and API management that don't feel natural to WG serving and require a dedicated WG with focus on networking.
Obviously we do plan to collaborate closely with serving WG to make sure the scope of each WG is well defined and complementary to each other.
- github: xunzhuo | ||
name: Xunzhuo | ||
company: Tencent | ||
email: [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised a similar concern when reviewing WG Node Lifecycle, are you sure you want to have that many leads? I know from my personal experience that having too many makes it sometimes challenging. Definitely not a blocker for WG creation, more like a suggestion 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, our list is big, but that is reflective of a large wave of interest. Each person on this list is representative of some technical aspect of the subject matter which makes them specialists/experts for the group. Each of them has spoken with me personally and I'm confident in their commitment to dedicate substantial time to the project, including attending and leading meetings, as well as actively driving and contributing to proposals.
|
||
The AI Gateway Working Group focuses on the intersection of AI and | ||
networking, particularly in the context of extending load-balancer, gateway | ||
and proxy technologies to manage and route traffic for AI Inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but why the existing WG Serving isn't the right place to cover those topics? The biggest issue as I'm seeing is that you will affect WG Serving work, only if focusing on the networking side of it. So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.
|
||
## Exit Criteria | ||
|
||
The WG is done when its deliverables are complete, according to the defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered a SIG-Network sub-project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In the originating document for this working group, we noted the potential for existing subprojects to house proposals generated by this group and suggested that we might even propose new subprojects. We feel there's discussion to be had and consensus to be built first.
Notably, as it pertains to this, we are trying to be very deliberate about an exit for this WG. We've seen long-running working groups and we don't want that for ourselves. We endeavor to deliver on our goals and disband within the next year. We think it's likely that conclusion could be a new subproject, but it may instead be multiple proposals to existing projects across multiple SIGs, thus why we feel a working group is appropriate.
Signed-off-by: Shane Utt <[email protected]>
Just wanted to pop in here for posterity that SIG-MC has been approached as a potential stakeholder SIG and is discussing what/who we can commit to and whether it makes sense within the scope of the current goals. In particular we have been made aware of a potential use case for the WG related to egress for inference endpoints to k8s or non-k8s inference endpoints. cc @skitt @JeremyOT |
This PR requests the creation of the "AI Gateway Working Group" as discussed and defined throughout: