Skip to content

Commit 8e0ca49

Browse files
committed
add WG AI Gateway
Signed-off-by: Shane Utt <[email protected]>
1 parent 8f1f9c8 commit 8e0ca49

File tree

8 files changed

+275
-0
lines changed

8 files changed

+275
-0
lines changed

OWNERS_ALIASES

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,13 @@ aliases:
130130
- mfahlandt
131131
- ritazh
132132
- terrytangyuan
133+
wg-ai-gateway-leads:
134+
- keithmattix
135+
- kflynn
136+
- kfswain
137+
- nirrozenbaum
138+
- shaneutt
139+
- xunzhuo
133140
wg-ai-integration-leads:
134141
- ardaguclu
135142
- rushmash91

liaisons.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ members will assume one of the departing members groups.
5555
| [SIG UI](sig-ui/README.md) | Maciej Szulik (**[@soltysh](https://github.com/soltysh)**) |
5656
| [SIG Windows](sig-windows/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) |
5757
| [WG AI Conformance](wg-ai-conformance/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
58+
| [WG AI Gateway](wg-ai-gateway/README.md) | Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**) |
5859
| [WG AI Integration](wg-ai-integration/README.md) | Paco Xu 徐俊杰 (**[@pacoxu](https://github.com/pacoxu)**) |
5960
| [WG Batch](wg-batch/README.md) | Antonio Ojea (**[@aojea](https://github.com/aojea)**) |
6061
| [WG Data Protection](wg-data-protection/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |

sig-list.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
6262
| Name | Label | Stakeholder SIGs |Organizers | Contact | Meetings |
6363
|------|-------|------------------|-----------|---------|----------|
6464
|[AI Conformance](wg-ai-conformance/README.md)|[ai-conformance](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-conformance)|* Architecture<br>* Testing<br>|* [Janet Kuo](https://github.com/janetkuo), Google<br>* [Mario Fahlandt](https://github.com/mfahlandt), Kubermatic GmbH<br>* [Rita Zhang](https://github.com/ritazh), Microsoft<br>* [Yuan Tang](https://github.com/terrytangyuan), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-conformance)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-conformance)|* Regular WG Meeting: [Thursdays at 10:00 PT (Pacific Time) (weekly)]()<br>
65+
|[AI Gateway](wg-ai-gateway/README.md)|[ai-gateway](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-gateway)|* Network<br>|* [Keith Mattix](https://github.com/keithmattix), Microsoft<br>* [Flynn](https://github.com/kflynn), Buoyant<br>* [Kellen Swain](https://github.com/kfswain), Google<br>* [Nir Rozenbaum](https://github.com/nirrozenbaum), IBM<br>* [Shane Utt](https://github.com/shaneutt), Red Hat<br>* [Xunzhuo](https://github.com/xunzhuo), Tencent<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-gateway)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway)|* WG AI Gateway Bi-Weekly Meeting (Earlier Option): [Mondays at 12PM UTC (bi-weekly)]()<br>* WG AI Gateway Bi-Weekly Meeting (Later Option): [Thursdays at 6PM UTC (bi-weekly)]()<br>
6566
|[AI Integration](wg-ai-integration/README.md)|[ai-integration](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-integration)|* API Machinery<br>* Apps<br>* Architecture<br>* Auth<br>* CLI<br>|* [Arda Guclu](https://github.com/ardaguclu), Red Hat<br>* [Arush Sharma](https://github.com/rushmash91), Amazon<br>* [Zvonko Kaiser](https://github.com/zvonkok), NVIDIA<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-integration)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-integration)|* WG AI Integration Weekly Meeting: [Wednesdays at 10:00 PT (Pacific Time) (weekly)]()<br>
6667
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Kevin Hannon](https://github.com/kannon92), Red Hat<br>* [Marcin Wielgus](https://github.com/mwielgus), Google<br>* [Maciej Szulik](https://github.com/soltysh), Defense Unicorns<br>* [Swati Sehgal](https://github.com/swatisehgal), Red Hat<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)|* Regular Meeting ([calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 15th 2024)s at 3PM CET (Central European Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>
6768
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>

sig-network/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
7373
## Working Groups
7474

7575
The following [working groups][working-group-definition] are sponsored by sig-network:
76+
* [WG AI Gateway](/wg-ai-gateway)
7677
* [WG Device Management](/wg-device-management)
7778
* [WG Node Lifecycle](/wg-node-lifecycle)
7879
* [WG Serving](/wg-serving)

sigs.yaml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3564,6 +3564,60 @@ workinggroups:
35643564
liaison:
35653565
github: pohly
35663566
name: Patrick Ohly
3567+
- dir: wg-ai-gateway
3568+
name: AI Gateway
3569+
mission_statement: >
3570+
The AI Gateway Working Group focuses on the intersection of AI and networking,
3571+
particularly in the context of extending load-balancer, gateway and proxy technologies
3572+
to manage and route traffic for AI Inference.
3573+
3574+
charter_link: charter.md
3575+
stakeholder_sigs:
3576+
- Network
3577+
label: ai-gateway
3578+
leadership:
3579+
chairs:
3580+
- github: keithmattix
3581+
name: Keith Mattix
3582+
company: Microsoft
3583+
3584+
- github: kflynn
3585+
name: Flynn
3586+
company: Buoyant
3587+
3588+
- github: kfswain
3589+
name: Kellen Swain
3590+
company: Google
3591+
3592+
- github: nirrozenbaum
3593+
name: Nir Rozenbaum
3594+
company: IBM
3595+
3596+
- github: shaneutt
3597+
name: Shane Utt
3598+
company: Red Hat
3599+
3600+
- github: xunzhuo
3601+
name: Xunzhuo
3602+
company: Tencent
3603+
3604+
meetings:
3605+
- description: WG AI Gateway Bi-Weekly Meeting (Earlier Option)
3606+
day: Monday
3607+
time: 12PM
3608+
tz: UTC
3609+
frequency: bi-weekly
3610+
- description: WG AI Gateway Bi-Weekly Meeting (Later Option)
3611+
day: Thursday
3612+
time: 6PM
3613+
tz: UTC
3614+
frequency: bi-weekly
3615+
contact:
3616+
slack: wg-ai-gateway
3617+
mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway
3618+
liaison:
3619+
github: justaugustus
3620+
name: Stephen Augustus
35673621
- dir: wg-ai-integration
35683622
name: AI Integration
35693623
mission_statement: >

wg-ai-gateway/OWNERS

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# See the OWNERS docs at https://go.k8s.io/owners
2+
3+
reviewers:
4+
- wg-ai-gateway-leads
5+
approvers:
6+
- wg-ai-gateway-leads
7+
labels:
8+
- wg/ai-gateway

wg-ai-gateway/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
<!---
2+
This is an autogenerated file!
3+
4+
Please do not edit this file directly, but instead make changes to the
5+
sigs.yaml file in the project root.
6+
7+
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
8+
--->
9+
# AI Gateway Working Group
10+
11+
The AI Gateway Working Group focuses on the intersection of AI and networking, particularly in the context of extending load-balancer, gateway and proxy technologies to manage and route traffic for AI Inference.
12+
13+
The [charter](charter.md) defines the scope and governance of the AI Gateway Working Group.
14+
15+
## Stakeholder SIGs
16+
* [SIG Network](/sig-network)
17+
18+
## Meetings
19+
*Joining the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway) for the group will typically add invites for the following meetings to your calendar.*
20+
* WG AI Gateway Bi-Weekly Meeting (Earlier Option): [Mondays at 12PM UTC]() (bi-weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=12PM&tz=UTC).
21+
* WG AI Gateway Bi-Weekly Meeting (Later Option): [Thursdays at 6PM UTC]() (bi-weekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=6PM&tz=UTC).
22+
23+
## Organizers
24+
25+
* Keith Mattix (**[@keithmattix](https://github.com/keithmattix)**), Microsoft
26+
* Flynn (**[@kflynn](https://github.com/kflynn)**), Buoyant
27+
* Kellen Swain (**[@kfswain](https://github.com/kfswain)**), Google
28+
* Nir Rozenbaum (**[@nirrozenbaum](https://github.com/nirrozenbaum)**), IBM
29+
* Shane Utt (**[@shaneutt](https://github.com/shaneutt)**), Red Hat
30+
* Xunzhuo (**[@xunzhuo](https://github.com/xunzhuo)**), Tencent
31+
32+
## Contact
33+
- Slack: [#wg-ai-gateway](https://kubernetes.slack.com/messages/wg-ai-gateway)
34+
- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway)
35+
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fai-gateway)
36+
- Steering Committee Liaison: Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**)
37+
<!-- BEGIN CUSTOM CONTENT -->
38+
39+
<!-- END CUSTOM CONTENT -->

wg-ai-gateway/charter.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# WG AI Gateway Charter
2+
3+
This charter adheres to the conventions described in the [Kubernetes Charter
4+
README] and uses the Roles and Organization Management outlined in
5+
[wg-governance].
6+
7+
[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
8+
[Kubernetes Charter README]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
9+
10+
## Background
11+
12+
We’ve seen large growth in the number of “AI Gateways” that have been launched
13+
in the last couple of years which deploy and operate on Kubernetes, often
14+
utilizing Gateway API. This WG aims to determine if the relevant features have
15+
staying power and will be commonly useful to users for years to come, and if we
16+
should expand the Kubernetes standards around this.
17+
18+
In SIG Network we have the Gateway API Inference Extension (GIE) project. The
19+
GIE currently is paired with a Gateway and “schedules” routes according to
20+
capabilities and metrics advertised by model serving platforms. For the purposes
21+
of this document we’ll call this the “model serving use case”, as this currently
22+
mainly covers the use case where models are being hosted on Kubernetes. There
23+
are deployment situations where users won’t host models but still use a Gateway
24+
to control access to 3rd party services (e.g. Gemini, OpenAI, Mistral, Claude,
25+
etc), we’ll call this the “egress use case”. We find that in both the model
26+
serving and egress use cases users want to be able to add more advanced filters,
27+
policies and other plugins that control or modify inference requests.
28+
29+
However, there are many features we haven’t fully explored yet that seem to be
30+
cleanly addable at the HTTPRoute level via filters or policies. Perhaps some
31+
would even be applicable at the Gateway level. For example, it is conceivable
32+
you might add a “semantic routing” at the HTTPRoute level as a filter to
33+
determine which model to route to before the “routing/scheduling” layer. Or
34+
perhaps you need a policy to rate-limit token usage for requests (maybe this
35+
could even apply at the Gateway level). For the purposes of this charter,
36+
we’ll refer to features at this level as “AI Gateway” features.
37+
38+
## Scope
39+
40+
The scope of this WG is to define terms like "AI Gateway" in the context of
41+
Kubernetes and propose deliverables that need to be adopted in order to **manage
42+
AI traffic** on Kubernetes, such as:
43+
44+
* **Prompt Guards** - Define and enforce content safety rules for inference
45+
content to detect and block sensitive or malicious prompts.
46+
* **Token Rate Limiting** - enforce rate limiting rules based on token usage to
47+
control usage and cost.
48+
* **Semantic Routing** - making a routing decision for an inference request
49+
based on semantic similarity of the request body.
50+
* **Semantic Caching** - Provide caching for inference response based on the
51+
semantic similarity of prompts.
52+
* **Response Risk** - Define and enforce content safety rules with inference
53+
response content to detect and block sensitive responses from generative AI
54+
models.
55+
* **Failure Modes** - How inference routing failures should be handled, what
56+
failure modes we think are important to cover. For instance this may
57+
encapsulate fallback and retry policies.
58+
* **Observability** - What standards for metrics and tracing for “AI Gateway”
59+
features should be standardized, and how?
60+
61+
> **Note**: The above list of features should be considered an example, and
62+
> non-exhaustive. We may not act on all of these, but the purpose is more to
63+
> illustrate the kind of features we will be exploring.
64+
65+
### In Scope
66+
67+
Overall guidance for the WG is to control scope as much as is feasible. The WG
68+
should avoid AI-specific functionality where it can: instead favoring the
69+
addition of provisions that help with AI networking and traffic management. In
70+
particular, the following is in scope:
71+
72+
* Providing definitions for networking related AI terms in a Kubernetes
73+
context, such as "AI Gateway".
74+
75+
* Defining important use-cases for Kubernetes users.
76+
77+
* Determining which common features and capabilities in the "AI Gateway" space
78+
need to be covered by Kubernetes standards and APIs according to user and
79+
implementation needs.
80+
81+
* Creating proposals for "AI Gateway" features and capabilities to the
82+
appropriate sub-projects.
83+
84+
* Propose new sub-projects if existing sub-projects are not sufficient.
85+
86+
### Out of Scope
87+
88+
* Developing whole "AI Gateway" solutions. This group will focus on enabling
89+
existing and new solutions to be more easily deployed and managed on
90+
Kubernetes, not creating any new Gateways.
91+
92+
* Any specific kind of hardware support is generally out of scope.
93+
94+
* This group will not cover the entire spectrum of networking for AI. For
95+
instance: RDMA networks are generally out of scope.
96+
97+
* Model serving, and AI workloads are out of scope (see below for a caveat about
98+
this).
99+
100+
### Additional Scope Distinctions
101+
102+
There is a subtle distinction to be made when it comes to the scope of this WG
103+
for load-balancing and routing inference, particular when dealing with inference
104+
_workloads_: When the use case includes local model serving on the cluster, and
105+
routing and load-balancing features _rely on information from the inference
106+
workloads_, this kind of routing falls under the scope of WG Serving.
107+
108+
A good example of this is the [Gateway API Inference Extension (GIE)][gie].
109+
This project came from WG Serving and specifically handles advanced routing and
110+
load-balancing for inference which is informed by metrics and capabilities being
111+
advertised by the model serving platform (e.g. VLLM). In this vein, the GIE is
112+
effectively an alternative to the Kubernetes `Service` API, whereas this WG
113+
means to operate more at the `Gateway` and `HTTPRoute` level.
114+
115+
Use cases which have to interact with the model serving layer for networking
116+
(as described above) are generally out of scope for this WG. If some feature
117+
the WG is working on absolutely must cross this line, the effort MUST be brought
118+
to WG Serving and worked on as a joint effort with them.
119+
120+
[gie]:https://github.com/kubernetes-sigs/gateway-api-inference-extension
121+
122+
## Deliverables
123+
124+
* A compendium of AI related networking definitions (e.g. "AI Gateway") and
125+
key use-cases for Kubernetes users.
126+
127+
* Provide a space for collaboration and experimentation to determine the most
128+
viable features and capabilities that Kubernetes should support. If there is
129+
strong consensus on any particular ideas, the WG will facilitate and
130+
coordinate the delivery of proposals in the appropriate areas.
131+
132+
## Stakeholders
133+
134+
* SIG Network
135+
136+
### Related WGs
137+
138+
* WG Serving - The domain of WG Serving is AI Workloads, which can be served by
139+
some of the networking support we want to add. When we have proposals that
140+
are strongly relevant to serving, we will loop them in so they can provide
141+
feedback.
142+
143+
## Roles and Organization Management
144+
145+
This working group adheres to the Roles and Organization Management outlined in
146+
[wg-governance] and opts-in to updates and modifications to [wg-governance].
147+
148+
[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md
149+
150+
## Exit Criteria
151+
152+
The WG is done when its deliverables are complete, according to the defined
153+
scope and a list of key use cases and features agreed upon by the group.
154+
155+
Ideally we want the lifecycle of the WG to go something like this:
156+
157+
1. Determine definitions and key use cases for Kubernetes users and
158+
implementations, and document those.
159+
2. Determine a list of key features that Kubernetes needs to best support the
160+
defined use cases.
161+
3. For each feature in that list, make proposals which support them to the
162+
appropriate sub-projects OR propose new sub-projects if deemed necessary.
163+
4. Once the feature list is complete, leave behind some guidance and best
164+
practices for future implementations and then exit.

0 commit comments

Comments
 (0)