Skip to content

add WG AI Gateway #8521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

add WG AI Gateway #8521

wants to merge 1 commit into from

Conversation

shaneutt
Copy link
Member

@shaneutt shaneutt commented Jul 18, 2025

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 18, 2025
@k8s-ci-robot k8s-ci-robot requested a review from Xunzhuo July 18, 2025 17:49
@k8s-ci-robot
Copy link
Contributor

@shaneutt: GitHub didn't allow me to request PR reviews from the following users: david-martin, kflynn, rootfs, yuzisun.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This PR requests the creation of the "AI Gateway Working Group" as discussed and defined throughout:

/cc @david-martin @keithmattix @kflynn @kfswain @nirrozenbaum @rootfs @Xunzhuo @yuzisun
Thank you all for volunteering to help lead this group!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shaneutt
Once this PR has been reviewed and has the lgtm label, please assign kaslin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added committee/steering Denotes an issue or PR intended to be handled by the steering committee. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/network Categorizes an issue or PR as relevant to SIG Network. do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. labels Jul 18, 2025
@shaneutt shaneutt force-pushed the ai-gw-wg branch 2 times, most recently from 37481e4 to 84d372b Compare July 18, 2025 18:00
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Jul 18, 2025
@cblecker
Copy link
Member

/hold
for review

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2025

## Deliverables

* A compendium of AI related networking definitions (e.g. "AI Gateway") and a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this some sort of stored artifact?

Copy link
Member Author

@shaneutt shaneutt Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Documentation somewhere with some definitions, where exactly TBD.

@shaneutt shaneutt requested a review from aojea July 25, 2025 12:20
@shaneutt shaneutt force-pushed the ai-gw-wg branch 3 times, most recently from 9c05bc4 to ea77fe1 Compare July 29, 2025 19:56
@nirrozenbaum
Copy link

@aojea a kind reminder...
are there any additional changes needed?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 5, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 5, 2025

The AI Gateway Working Group focuses on the intersection of AI and
networking, particularly in the context of extending load-balancer, gateway
and proxy technologies to manage and route traffic for AI Inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering if this WG has some overlap with WG Serving.

- Explore new projects that improve orchestration, scaling, and load balancing
of inference workloads and compose well with other workloads on Kubernetes

Could you help clarify the scope differences?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's squarely focused on the networking aspects of inference, not the compute or lifecycle management aspects as in the case of wg-serving. The key point is this one:

manage and route traffic for AI Inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but why the existing WG Serving isn't the right place to cover those topics? The biggest issue as I'm seeing is that you will affect WG Serving work, only if focusing on the networking side of it. So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.

Copy link
Member Author

@shaneutt shaneutt Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the existing WG Serving isn't the right place to cover those topics?

WG Serving's charter makes it clear that it is focused broadly on serving workloads as the primary objective, and its goals speak directly to that. Notably, the stated goals do not include any networking specific deliverables.

This WG is focused very tightly on traffic and API management, going as far as focusing on very specific individual features in that domain that we want to explore (see the document from the description).

While it may seem plausible for any working group to claim that it is the suitable forum for networking-related discussions, this perspective does not hold when we move beyond standard networking to address protocol or domain-specific networking, such as in use cases like this. In these instances, it is essential to engage specialists from the community and provide room for dedicated focus. It is because of this technical specificity, the need to engage more people and grow our community and the need for autonomy and focus that WG Serving is not the right place to cover these topics.

Additionally, one of our primary use cases (see the "Why?" section of our originating document) covers the situation where users want to perform inference from their Kubernetes applications, but they will be reaching outside the cluster to do it. This use case is effectively an egress use case, and is definitively networking, and is entirely out of scope for WG Serving.

So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.

I've added WG Serving as an explicit collaborator which we will need to keep looped in for review on any of our proposals that deal explicitly with model serving backends.

/cc @ArangoGutierrez @SergeyKanzhelev @terrytangyuan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| This use case is effectively an egress networking use case and is entirely out of scope for WG Serving.
+1 that. I think WG Serving will remain a collaborator for any work that touches model-serving backends, but the proposed effort focuses on egress networking for inference from Kubernetes apps, which is out of its scope. The proposal looks promising with SIG MultiCluster for cross-cluster traffic policy and failover, SIG Apps for app-facing APIs and workload integration, and SIG Auth for identity, authentication, and policy on egress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the above: serving is just one piece of the AI networking story; those users have their own models and are doing local inference. IMO, far more users are going to want to consume remote models and apply policy on that consumption

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree with the above comments, there are various topics in the scope of network traffic and API management that don't feel natural to WG serving and require a dedicated WG with focus on networking.
Obviously we do plan to collaborate closely with serving WG to make sure the scope of each WG is well defined and complementary to each other.

- github: xunzhuo
name: Xunzhuo
company: Tencent
email: [email protected]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised a similar concern when reviewing WG Node Lifecycle, are you sure you want to have that many leads? I know from my personal experience that having too many makes it sometimes challenging. Definitely not a blocker for WG creation, more like a suggestion 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, our list is big, but that is reflective of a large wave of interest. Each person on this list is representative of some technical aspect of the subject matter which makes them specialists/experts for the group. Each of them has spoken with me personally and I'm confident in their commitment to dedicate substantial time to the project, including attending and leading meetings, as well as actively driving and contributing to proposals.


The AI Gateway Working Group focuses on the intersection of AI and
networking, particularly in the context of extending load-balancer, gateway
and proxy technologies to manage and route traffic for AI Inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but why the existing WG Serving isn't the right place to cover those topics? The biggest issue as I'm seeing is that you will affect WG Serving work, only if focusing on the networking side of it. So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.


## Exit Criteria

The WG is done when its deliverables are complete, according to the defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered a SIG-Network sub-project?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In the originating document for this working group, we noted the potential for existing subprojects to house proposals generated by this group and suggested that we might even propose new subprojects. We feel there's discussion to be had and consensus to be built first.

Notably, as it pertains to this, we are trying to be very deliberate about an exit for this WG. We've seen long-running working groups and we don't want that for ourselves. We endeavor to deliver on our goals and disband within the next year. We think it's likely that conclusion could be a new subproject, but it may instead be multiple proposals to existing projects across multiple SIGs, thus why we feel a working group is appropriate.

Signed-off-by: Shane Utt <[email protected]>
@lauralorenz
Copy link
Contributor

Just wanted to pop in here for posterity that SIG-MC has been approached as a potential stakeholder SIG and is discussing what/who we can commit to and whether it makes sense within the scope of the current goals. In particular we have been made aware of a potential use case for the WG related to egress for inference endpoints to k8s or non-k8s inference endpoints. cc @skitt @JeremyOT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. committee/steering Denotes an issue or PR intended to be handled by the steering committee. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.