add WG AI Gateway

shaneutt · shaneutt · commit c53b597d73a4 · 2025-08-25T10:24:43.000-04:00
Signed-off-by: Shane Utt &lt;shaneutt@linux.com&gt;
diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES
@@ -130,6 +130,13 @@ aliases:
     - mfahlandt
     - ritazh
     - terrytangyuan
+  wg-ai-gateway-leads:
+    - keithmattix
+    - kflynn
+    - kfswain
+    - nirrozenbaum
+    - shaneutt
+    - xunzhuo
   wg-ai-integration-leads:
     - ardaguclu
     - rushmash91
diff --git a/liaisons.md b/liaisons.md
@@ -55,6 +55,7 @@ members will assume one of the departing members groups.
 | [SIG UI](sig-ui/README.md) | Maciej Szulik (**[@soltysh](https://github.com/soltysh)**) |
 | [SIG Windows](sig-windows/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) |
 | [WG AI Conformance](wg-ai-conformance/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
+| [WG AI Gateway](wg-ai-gateway/README.md) | Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**) |
 | [WG AI Integration](wg-ai-integration/README.md) | Paco Xu 徐俊杰 (**[@pacoxu](https://github.com/pacoxu)**) |
 | [WG Batch](wg-batch/README.md) | Antonio Ojea (**[@aojea](https://github.com/aojea)**) |
 | [WG Data Protection](wg-data-protection/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
diff --git a/sig-list.md b/sig-list.md
@@ -62,6 +62,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
 | Name | Label | Stakeholder SIGs |Organizers | Contact | Meetings |
 |------|-------|------------------|-----------|---------|----------|
 |[AI Conformance](wg-ai-conformance/README.md)|[ai-conformance](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-conformance)|* Architecture<br>* Testing<br>|* [Janet Kuo](https://github.com/janetkuo), Google<br>* [Mario Fahlandt](https://github.com/mfahlandt), Kubermatic GmbH<br>* [Rita Zhang](https://github.com/ritazh), Microsoft<br>* [Yuan Tang](https://github.com/terrytangyuan), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-conformance)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-conformance)|* Regular WG Meeting: [Thursdays at 10:00 PT (Pacific Time) (weekly)]()<br>
+|[AI Gateway](wg-ai-gateway/README.md)|[ai-gateway](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-gateway)|* Network<br>|* [Keith Mattix](https://github.com/keithmattix), Microsoft<br>* [Flynn](https://github.com/kflynn), Buoyant<br>* [Kellen Swain](https://github.com/kfswain), Google<br>* [Nir Rozenbaum](https://github.com/nirrozenbaum), IBM<br>* [Shane Utt](https://github.com/shaneutt), Red Hat<br>* [Xunzhuo](https://github.com/xunzhuo), Tencent<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-gateway)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway)|* WG AI Gateway Bi-Weekly Meeting (Earlier Option): [Mondays at 12PM UTC (bi-weekly)]()<br>* WG AI Gateway Bi-Weekly Meeting (Later Option): [Thursdays at 6PM UTC (bi-weekly)]()<br>
 |[AI Integration](wg-ai-integration/README.md)|[ai-integration](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-integration)|* API Machinery<br>* Apps<br>* Architecture<br>* Auth<br>* CLI<br>|* [Arda Guclu](https://github.com/ardaguclu), Red Hat<br>* [Arush Sharma](https://github.com/rushmash91), Amazon<br>* [Zvonko Kaiser](https://github.com/zvonkok), NVIDIA<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-integration)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-integration)|* WG AI Integration Weekly Meeting: [Wednesdays at 10:00 PT (Pacific Time) (weekly)]()<br>
 |[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Kevin Hannon](https://github.com/kannon92), Red Hat<br>* [Marcin Wielgus](https://github.com/mwielgus), Google<br>* [Maciej Szulik](https://github.com/soltysh), Defense Unicorns<br>* [Swati Sehgal](https://github.com/swatisehgal), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)|* Regular Meeting ([calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 15th 2024)s at 3PM CET (Central European Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>
 |[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>
diff --git a/sig-network/README.md b/sig-network/README.md
@@ -73,6 +73,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
 ## Working Groups
 
 The following [working groups][working-group-definition] are sponsored by sig-network:
+* [WG AI Gateway](/wg-ai-gateway)
 * [WG Device Management](/wg-device-management)
 * [WG Node Lifecycle](/wg-node-lifecycle)
 * [WG Serving](/wg-serving)
diff --git a/sigs.yaml b/sigs.yaml
@@ -3564,6 +3564,60 @@ workinggroups:
     liaison:
       github: pohly
       name: Patrick Ohly
+- dir: wg-ai-gateway
+  name: AI Gateway
+  mission_statement: >
+    The AI Gateway Working Group focuses on the intersection of AI and networking,
+    particularly in the context of extending load-balancer, gateway and proxy technologies
+    to manage and route traffic for AI Inference.
+
+  charter_link: charter.md
+  stakeholder_sigs:
+  - Network
+  label: ai-gateway
+  leadership:
+    chairs:
+    - github: keithmattix
+      name: Keith Mattix
+      company: Microsoft
+      email: keithmattix2@gmail.com
+    - github: kflynn
+      name: Flynn
+      company: Buoyant
+      email: flynn@buoyant.io
+    - github: kfswain
+      name: Kellen Swain
+      company: Google
+      email: kfswain@google.com
+    - github: nirrozenbaum
+      name: Nir Rozenbaum
+      company: IBM
+      email: nirro@il.ibm.com
+    - github: shaneutt
+      name: Shane Utt
+      company: Red Hat
+      email: sutt@redhat.com
+    - github: xunzhuo
+      name: Xunzhuo
+      company: Tencent
+      email: mixdeers@gmail.com
+  meetings:
+  - description: WG AI Gateway Bi-Weekly Meeting (Earlier Option)
+    day: Monday
+    time: 12PM
+    tz: UTC
+    frequency: bi-weekly
+  - description: WG AI Gateway Bi-Weekly Meeting (Later Option)
+    day: Thursday
+    time: 6PM
+    tz: UTC
+    frequency: bi-weekly
+  contact:
+    slack: wg-ai-gateway
+    mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway
+    liaison:
+      github: justaugustus
+      name: Stephen Augustus
 - dir: wg-ai-integration
   name: AI Integration
   mission_statement: >
diff --git a/wg-ai-gateway/OWNERS b/wg-ai-gateway/OWNERS
@@ -0,0 +1,8 @@
+# See the OWNERS docs at https://go.k8s.io/owners
+
+reviewers:
+  - wg-ai-gateway-leads
+approvers:
+  - wg-ai-gateway-leads
+labels:
+  - wg/ai-gateway
diff --git a/wg-ai-gateway/README.md b/wg-ai-gateway/README.md
@@ -0,0 +1,39 @@
+<!---
+This is an autogenerated file!
+
+Please do not edit this file directly, but instead make changes to the
+sigs.yaml file in the project root.
+
+To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
+--->
+# AI Gateway Working Group
+
+The AI Gateway Working Group focuses on the intersection of AI and networking, particularly in the context of extending load-balancer, gateway and proxy technologies to manage and route traffic for AI Inference.
+
+The [charter](charter.md) defines the scope and governance of the AI Gateway Working Group.
+
+## Stakeholder SIGs
+* [SIG Network](/sig-network)
+
+## Meetings
+*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway) for the group will typically add invites for the following meetings to your calendar.*
+* WG AI Gateway Bi-Weekly Meeting (Earlier Option): [Mondays at 12PM UTC]() (bi-weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=12PM&tz=UTC).
+* WG AI Gateway Bi-Weekly Meeting (Later Option): [Thursdays at 6PM UTC]() (bi-weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=6PM&tz=UTC).
+
+## Organizers
+
+* Keith Mattix (**[@keithmattix](https://github.com/keithmattix)**), Microsoft
+* Flynn (**[@kflynn](https://github.com/kflynn)**), Buoyant
+* Kellen Swain (**[@kfswain](https://github.com/kfswain)**), Google
+* Nir Rozenbaum (**[@nirrozenbaum](https://github.com/nirrozenbaum)**), IBM
+* Shane Utt (**[@shaneutt](https://github.com/shaneutt)**), Red Hat
+* Xunzhuo (**[@xunzhuo](https://github.com/xunzhuo)**), Tencent
+
+## Contact
+- Slack: [#wg-ai-gateway](https://kubernetes.slack.com/messages/wg-ai-gateway)
+- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway)
+- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fai-gateway)
+- Steering Committee Liaison: Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**)
+<!-- BEGIN CUSTOM CONTENT -->
+
+<!-- END CUSTOM CONTENT -->
diff --git a/wg-ai-gateway/charter.md b/wg-ai-gateway/charter.md
@@ -0,0 +1,164 @@
+# WG AI Gateway Charter
+
+This charter adheres to the conventions described in the [Kubernetes Charter
+README] and uses the Roles and Organization Management outlined in
+[wg-governance].
+
+[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
+[Kubernetes Charter README]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
+
+## Background
+
+We’ve seen large growth in the number of “AI Gateways” that have been launched
+in the last couple of years which deploy and operate on Kubernetes, often
+utilizing Gateway API. This WG aims to determine if the relevant features have
+staying power and will be commonly useful to users for years to come, and if we
+should expand the Kubernetes standards around this.
+
+In SIG Network we have the Gateway API Inference Extension (GIE) project. The
+GIE currently is paired with a Gateway and “schedules” routes according to
+capabilities and metrics advertised by model serving platforms. For the purposes
+of this document we’ll call this the “model serving use case”, as this currently
+mainly covers the use case where models are being hosted on Kubernetes. There
+are deployment situations where users won’t host models but still use a Gateway
+to control access to 3rd party services (e.g. Gemini, OpenAI, Mistral, Claude,
+etc), we’ll call this the “egress use case”. We find that in both the model
+serving and egress use cases users want to be able to add more advanced filters,
+policies and other plugins that control or modify inference requests.
+
+However, there are many features we haven’t fully explored yet that seem to be
+cleanly addable at the HTTPRoute level via filters or policies. Perhaps some
+would even be applicable at the Gateway level. For example, it is conceivable
+you might add a “semantic routing” at the HTTPRoute level as a filter to
+determine which model to route to before the “routing/scheduling” layer. Or
+perhaps you need a policy to rate-limit token usage for requests (maybe this
+could even apply at the Gateway level). For the purposes of this charter,
+we’ll refer to features at this level as “AI Gateway” features.
+
+## Scope
+
+The scope of this WG is to define terms like "AI Gateway" in the context of
+Kubernetes and propose deliverables that need to be adopted in order to **manage
+AI traffic** on Kubernetes, such as:
+
+* **Prompt Guards** - Define and enforce content safety rules for inference
+  content to detect and block sensitive or malicious prompts.
+* **Token Rate Limiting** - enforce rate limiting rules based on token usage to
+  control usage and cost.
+* **Semantic Routing** - making a routing decision for an inference request
+  based on semantic similarity of the request body.
+* **Semantic Caching** - Provide caching for inference response based on the
+  semantic similarity of prompts.
+* **Response Risk** - Define and enforce content safety rules with inference
+  response content to detect and block sensitive responses from generative AI
+  models.
+* **Failure Modes** - How inference routing failures should be handled, what
+  failure modes we think are important to cover. For instance this may
+  encapsulate fallback and retry policies.
+* **Observability** - What standards for metrics and tracing for “AI Gateway”
+  features should be standardized, and how?
+
+> **Note**: The above list of features should be considered an example, and
+> non-exhaustive. We may not act on all of these, but the purpose is more to
+> illustrate the kind of features we will be exploring.
+
+### In Scope
+
+Overall guidance for the WG is to control scope as much as is feasible. The WG
+should avoid AI-specific functionality where it can: instead favoring the
+addition of provisions that help with AI networking and traffic management. In
+particular, the following is in scope:
+
+* Providing definitions for networking related AI terms in a Kubernetes
+  context, such as "AI Gateway".
+
+* Defining important use-cases for Kubernetes users.
+
+* Determining which common features and capabilities in the "AI Gateway" space
+  need to be covered by Kubernetes standards and APIs according to user and
+  implementation needs.
+
+* Creating proposals for "AI Gateway" features and capabilities to the
+  appropriate sub-projects.
+
+* Propose new sub-projects if existing sub-projects are not sufficient.
+
+### Out of Scope
+
+* Developing whole "AI Gateway" solutions. This group will focus on enabling
+  existing and new solutions to be more easily deployed and managed on
+  Kubernetes, not creating any new Gateways.
+
+* Any specific kind of hardware support is generally out of scope.
+
+* This group will not cover the entire spectrum of networking for AI. For
+  instance: RDMA networks are generally out of scope.
+
+* Model serving, and AI workloads are out of scope (see below for a caveat about
+  this).
+
+### Additional Scope Distinctions
+
+There is a subtle distinction to be made when it comes to the scope of this WG
+for load-balancing and routing inference, particular when dealing with inference
+_workloads_: When the use case includes local model serving on the cluster, and
+routing and load-balancing features _rely on information from the inference
+workloads_, this kind of routing falls under the scope of WG Serving.
+
+A good example of this is the [Gateway API Inference Extension (GIE)][gie].
+This project came from WG Serving and specifically handles advanced routing and
+load-balancing for inference which is informed by metrics and capabilities being
+advertised by the model serving platform (e.g. VLLM). In this vein, the GIE is
+effectively an alternative to the Kubernetes `Service` API, whereas this WG
+means to operate more at the `Gateway` and `HTTPRoute` level.
+
+Use cases which have to interact with the model serving layer for networking
+(as described above) are generally out of scope for this WG. If some feature
+the WG is working on absolutely must cross this line, the effort MUST be brought
+to WG Serving and worked on as a joint effort with them.
+
+[gie]:https://github.com/kubernetes-sigs/gateway-api-inference-extension
+
+## Deliverables
+
+* A compendium of AI related networking definitions (e.g. "AI Gateway") and a
+  key use-cases for Kubernetes users.
+
+* Provide a space for collaboration and experimentation to determine the most
+  viable features and capabilities that Kubernetes should support. If there is
+  strong consensus on any particular ideas, the WG will facilitate and
+  coordinate the delivery of proposals in the appropriate areas.
+
+## Stakeholders
+
+* SIG Network
+
+### Related WGs
+
+* WG Serving - The domain of WG Serving is AI Workloads, which can be served by
+  some of the networking support we want to add. When we have proposals that
+  are strongly relevant to serving, we will loop them in so they can provide
+  feedback.
+
+## Roles and Organization Management
+
+This working group adheres to the Roles and Organization Management outlined in
+[wg-governance] and opts-in to updates and modifications to [wg-governance].
+
+[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
+
+## Exit Criteria
+
+The WG is done when its deliverables are complete, according to the defined
+scope and a list of key use cases and features agreed upon by the group.
+
+Ideally we want the lifecycle of the WG to go something like this:
+
+1. Determine definitions and key use cases for Kubernetes users and
+   implementations, and document those.
+2. Determine a list of key features that Kubernetes needs to best support the
+   defined use cases.
+3. For each feature in that list, make proposals which support them to the
+   appropriate sub-projects OR propose new sub-projects if deemed necessary.
+4. Once the feature list is complete, leave behind some guidance and best
+   practices for future implementations and then exit.