Skip to content

add WG AI Gateway #8521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions OWNERS_ALIASES
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,13 @@ aliases:
- mfahlandt
- ritazh
- terrytangyuan
wg-ai-gateway-leads:
- keithmattix
- kflynn
- kfswain
- nirrozenbaum
- shaneutt
- xunzhuo
wg-ai-integration-leads:
- ardaguclu
- rushmash91
Expand Down
1 change: 1 addition & 0 deletions sig-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
| Name | Label | Stakeholder SIGs |Organizers | Contact | Meetings |
|------|-------|------------------|-----------|---------|----------|
|[AI Conformance](wg-ai-conformance/README.md)|[ai-conformance](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-conformance)|* Architecture<br>* Testing<br>|* [Janet Kuo](https://github.com/janetkuo), Google<br>* [Mario Fahlandt](https://github.com/mfahlandt), Kubermatic GmbH<br>* [Rita Zhang](https://github.com/ritazh), Microsoft<br>* [Yuan Tang](https://github.com/terrytangyuan), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-conformance)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-conformance)|* Regular WG Meeting: [Thursdays at 10:00 PT (Pacific Time) (weekly)]()<br>
|[AI Gateway](wg-ai-gateway/README.md)|[ai-gateway](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-gateway)|* Network<br>|* [Keith Mattix](https://github.com/keithmattix), Microsoft<br>* [Flynn](https://github.com/kflynn), Buoyant<br>* [Kellen Swain](https://github.com/kfswain), Google<br>* [Nir Rozenbaum](https://github.com/nirrozenbaum), IBM<br>* [Shane Utt](https://github.com/shaneutt), Red Hat<br>* [Xunzhuo](https://github.com/xunzhuo), Tencent<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-gateway)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway)|* WG AI Gateway Bi-Weekly Meeting (Earlier Option): [Mondays at 12PM UTC (bi-weekly)]()<br>* WG AI Gateway Bi-Weekly Meeting (Later Option): [Thursdays at 6PM UTC (bi-weekly)]()<br>
|[AI Integration](wg-ai-integration/README.md)|[ai-integration](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-integration)|* API Machinery<br>* Apps<br>* Architecture<br>* Auth<br>* CLI<br>|* [Arda Guclu](https://github.com/ardaguclu), Red Hat<br>* [Arush Sharma](https://github.com/rushmash91), Amazon<br>* [Zvonko Kaiser](https://github.com/zvonkok), NVIDIA<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-integration)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-integration)|* WG AI Integration Weekly Meeting: [Wednesdays at 10:00 PT (Pacific Time) (weekly)]()<br>
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Kevin Hannon](https://github.com/kannon92), Red Hat<br>* [Marcin Wielgus](https://github.com/mwielgus), Google<br>* [Maciej Szulik](https://github.com/soltysh), Defense Unicorns<br>* [Swati Sehgal](https://github.com/swatisehgal), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)|* Regular Meeting ([calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 15th 2024)s at 3PM CET (Central European Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>
Expand Down
1 change: 1 addition & 0 deletions sig-network/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
## Working Groups

The following [working groups][working-group-definition] are sponsored by sig-network:
* [WG AI Gateway](/wg-ai-gateway)
* [WG Device Management](/wg-device-management)
* [WG Node Lifecycle](/wg-node-lifecycle)
* [WG Serving](/wg-serving)
Expand Down
51 changes: 51 additions & 0 deletions sigs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3561,6 +3561,57 @@ workinggroups:
liaison:
github: pohly
name: Patrick Ohly
- dir: wg-ai-gateway
name: AI Gateway
mission_statement: >
The AI Gateway Working Group focuses on the intersection of AI and networking,
particularly in the context of extending load-balancer, gateway and proxy technologies
to manage and route traffic for AI Inference.

charter_link: charter.md
stakeholder_sigs:
- Network
label: ai-gateway
leadership:
chairs:
- github: keithmattix
name: Keith Mattix
company: Microsoft
email: [email protected]
- github: kflynn
name: Flynn
company: Buoyant
email: [email protected]
- github: kfswain
name: Kellen Swain
company: Google
email: [email protected]
- github: nirrozenbaum
name: Nir Rozenbaum
company: IBM
email: [email protected]
- github: shaneutt
name: Shane Utt
company: Red Hat
email: [email protected]
- github: xunzhuo
name: Xunzhuo
company: Tencent
email: [email protected]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised a similar concern when reviewing WG Node Lifecycle, are you sure you want to have that many leads? I know from my personal experience that having too many makes it sometimes challenging. Definitely not a blocker for WG creation, more like a suggestion 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, our list is big, but that is reflective of a large wave of interest. Each person on this list is representative of some technical aspect of the subject matter which makes them specialists/experts for the group. Each of them has spoken with me personally and I'm confident in their commitment to dedicate substantial time to the project, including attending and leading meetings, as well as actively driving and contributing to proposals.

meetings:
- description: WG AI Gateway Bi-Weekly Meeting (Earlier Option)
day: Monday
time: 12PM
tz: UTC
frequency: bi-weekly
- description: WG AI Gateway Bi-Weekly Meeting (Later Option)
day: Thursday
time: 6PM
tz: UTC
frequency: bi-weekly
contact:
slack: wg-ai-gateway
mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway
- dir: wg-ai-integration
name: AI Integration
mission_statement: >
Expand Down
8 changes: 8 additions & 0 deletions wg-ai-gateway/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# See the OWNERS docs at https://go.k8s.io/owners

reviewers:
- wg-ai-gateway-leads
approvers:
- wg-ai-gateway-leads
labels:
- wg/ai-gateway
38 changes: 38 additions & 0 deletions wg-ai-gateway/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!---
This is an autogenerated file!
Please do not edit this file directly, but instead make changes to the
sigs.yaml file in the project root.
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
--->
# AI Gateway Working Group

The AI Gateway Working Group focuses on the intersection of AI and networking, particularly in the context of extending load-balancer, gateway and proxy technologies to manage and route traffic for AI Inference.

The [charter](charter.md) defines the scope and governance of the AI Gateway Working Group.

## Stakeholder SIGs
* [SIG Network](/sig-network)

## Meetings
*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway) for the group will typically add invites for the following meetings to your calendar.*
* WG AI Gateway Bi-Weekly Meeting (Earlier Option): [Mondays at 12PM UTC]() (bi-weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=12PM&tz=UTC).
* WG AI Gateway Bi-Weekly Meeting (Later Option): [Thursdays at 6PM UTC]() (bi-weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=6PM&tz=UTC).

## Organizers

* Keith Mattix (**[@keithmattix](https://github.com/keithmattix)**), Microsoft
* Flynn (**[@kflynn](https://github.com/kflynn)**), Buoyant
* Kellen Swain (**[@kfswain](https://github.com/kfswain)**), Google
* Nir Rozenbaum (**[@nirrozenbaum](https://github.com/nirrozenbaum)**), IBM
* Shane Utt (**[@shaneutt](https://github.com/shaneutt)**), Red Hat
* Xunzhuo (**[@xunzhuo](https://github.com/xunzhuo)**), Tencent

## Contact
- Slack: [#wg-ai-gateway](https://kubernetes.slack.com/messages/wg-ai-gateway)
- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway)
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fai-gateway)
<!-- BEGIN CUSTOM CONTENT -->

<!-- END CUSTOM CONTENT -->
101 changes: 101 additions & 0 deletions wg-ai-gateway/charter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# WG AI Gateway Charter

This charter adheres to the conventions described in the [Kubernetes Charter
README] and uses the Roles and Organization Management outlined in
[wg-governance].

[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
[Kubernetes Charter README]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md

## Scope

The AI Gateway Working Group focuses on the intersection of AI and
networking, particularly in the context of extending load-balancer, gateway
and proxy technologies to manage and route traffic for AI Inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering if this WG has some overlap with WG Serving.

- Explore new projects that improve orchestration, scaling, and load balancing
of inference workloads and compose well with other workloads on Kubernetes

Could you help clarify the scope differences?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's squarely focused on the networking aspects of inference, not the compute or lifecycle management aspects as in the case of wg-serving. The key point is this one:

manage and route traffic for AI Inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but why the existing WG Serving isn't the right place to cover those topics? The biggest issue as I'm seeing is that you will affect WG Serving work, only if focusing on the networking side of it. So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.

Copy link
Member Author

@shaneutt shaneutt Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the existing WG Serving isn't the right place to cover those topics?

WG Serving's charter makes it clear that it is focused broadly on serving workloads as the primary objective, and its goals speak directly to that. Notably, the stated goals do not include any networking specific deliverables.

This WG is focused very tightly on traffic and API management, going as far as focusing on very specific individual features in that domain that we want to explore (see the document from the description).

While it may seem plausible for any working group to claim that it is the suitable forum for networking-related discussions, this perspective does not hold when we move beyond standard networking to address protocol or domain-specific networking, such as in use cases like this. In these instances, it is essential to engage specialists from the community and provide room for dedicated focus. It is because of this technical specificity, the need to engage more people and grow our community and the need for autonomy and focus that WG Serving is not the right place to cover these topics.

Additionally, one of our primary use cases (see the "Why?" section of our originating document) covers the situation where users want to perform inference from their Kubernetes applications, but they will be reaching outside the cluster to do it. This use case is effectively an egress use case, and is definitively networking, and is entirely out of scope for WG Serving.

So either explicitly listing out the reasons for that extraction or how are you going to collaborate with WG Serving in that scope will be necessary to ensure the work doesn't diverge.

I've added WG Serving as an explicit collaborator which we will need to keep looped in for review on any of our proposals that deal explicitly with model serving backends.

/cc @ArangoGutierrez @SergeyKanzhelev @terrytangyuan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| This use case is effectively an egress networking use case and is entirely out of scope for WG Serving.
+1 that. I think WG Serving will remain a collaborator for any work that touches model-serving backends, but the proposed effort focuses on egress networking for inference from Kubernetes apps, which is out of its scope. The proposal looks promising with SIG MultiCluster for cross-cluster traffic policy and failover, SIG Apps for app-facing APIs and workload integration, and SIG Auth for identity, authentication, and policy on egress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the above: serving is just one piece of the AI networking story; those users have their own models and are doing local inference. IMO, far more users are going to want to consume remote models and apply policy on that consumption

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree with the above comments, there are various topics in the scope of network traffic and API management that don't feel natural to WG serving and require a dedicated WG with focus on networking.
Obviously we do plan to collaborate closely with serving WG to make sure the scope of each WG is well defined and complementary to each other.


This working group will define terms like "AI Gateway" within the context of
Kubernetes and key use cases for users and implementations. It will propose
deliverables that need to be adopted in order to serve AI Inference on
Kubernetes.

This comes at a time where there is a proliferation of "AI Gateways" being used
for AI Inference, and a strong need for focus and collaboration to ensure
standards around this space so that Kubernetes users get the features they need
in a consistent way on the platform.

### In Scope

Overall guidance for the WG is to control scope as much as is feasible. The WG
should avoid AI-specific functionality where it can: instead favoring the
addition of provisions that help with AI use-cases, but are otherwise normal
networking facilities. Under that guidance, the following is in-scope:

* Providing definitions for networking related AI terms in a Kubernetes
context.

* Defining important AI networking use-cases for Kubernetes users.

* Determining which common features and capabilities in the "AI Gateway" space
need to be covered by Kubernetes standards and APIs according to user and
implementation needs.

* Creating proposals for "AI Gateway" features and capabilities to the
appropriate sub-projects.

* Propose new sub-projects if existing sub-projects are not sufficient.

### Out of Scope

* Developing whole "AI Gateway" solutions. This group will focus on
enabling existing and new solutions to be more easily deployed and managed on
Kubernetes, not adding any new production solutions maintained thereafter by
upstream Kubernetes.

* Any specific kind of hardware support is generally out of scope.

* This group will not cover the entire spectrum of networking for AI. For
instance: RDMA networks are generally out of scope.

## Deliverables

* A compendium of AI related networking definitions (e.g. "AI Gateway") and a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this some sort of stored artifact?

Copy link
Member Author

@shaneutt shaneutt Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Documentation somewhere with some definitions, where exactly TBD.

key use-cases for Kubernetes users.

* Provide a space for collaboration and experimentation to determine the most
viable features and capabilities that Kubernetes should support. If there is
strong consensus on any particular ideas, the WG will facilitate and
coordinate the delivery of proposals in the appropriate areas.

## Stakeholders

* SIG Network

### Related WGs

* WG Serving - The domain of WG Serving is AI Workloads, which can be served by
some of the networking support we want to add. When we have proposals that
are strongly relevant to serving, we will loop them in so they can provide
feedback.

## Roles and Organization Management

This working group adheres to the Roles and Organization Management outlined in
[wg-governance] and opts-in to updates and modifications to [wg-governance].

[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md

## Exit Criteria

The WG is done when its deliverables are complete, according to the defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered a SIG-Network sub-project?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In the originating document for this working group, we noted the potential for existing subprojects to house proposals generated by this group and suggested that we might even propose new subprojects. We feel there's discussion to be had and consensus to be built first.

Notably, as it pertains to this, we are trying to be very deliberate about an exit for this WG. We've seen long-running working groups and we don't want that for ourselves. We endeavor to deliver on our goals and disband within the next year. We think it's likely that conclusion could be a new subproject, but it may instead be multiple proposals to existing projects across multiple SIGs, thus why we feel a working group is appropriate.

scope and a list of key use cases and features agreed upon by the group.

Ideally we want the lifecycle of the WG to go something like this:

1. Determine definitions and key use cases for Kubernetes users and
implementations, and document those.
2. Determine a list of key features that Kubernetes needs to best support the
defined use cases.
3. For each feature in that list, make proposals which support them to the
appropriate sub-projects OR propose new sub-projects if deemed necessary.
4. Once the feature list is complete, leave behind some guidance and best
practices for future implementations and then exit.