|
| 1 | +# WG AI Gateway Charter |
| 2 | + |
| 3 | +This charter adheres to the conventions described in the [Kubernetes Charter |
| 4 | +README] and uses the Roles and Organization Management outlined in |
| 5 | +[wg-governance]. |
| 6 | + |
| 7 | +[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md |
| 8 | +[Kubernetes Charter README]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md |
| 9 | + |
| 10 | +## Background |
| 11 | + |
| 12 | +We’ve seen large growth in the number of “AI Gateways” that have been launched |
| 13 | +in the last couple of years which deploy and operate on Kubernetes, often |
| 14 | +utilizing Gateway API. This WG aims to determine if the relevant features have |
| 15 | +staying power and will be commonly useful to users for years to come, and if we |
| 16 | +should expand the Kubernetes standards around this. |
| 17 | + |
| 18 | +In SIG Network we have the Gateway API Inference Extension (GIE) project. The |
| 19 | +GIE currently is paired with a Gateway and “schedules” routes according to |
| 20 | +capabilities and metrics advertised by model serving platforms. For the purposes |
| 21 | +of this document we’ll call this the “model serving use case”, as this currently |
| 22 | +mainly covers the use case where models are being hosted on Kubernetes. There |
| 23 | +are deployment situations where users won’t host models but still use a Gateway |
| 24 | +to control access to 3rd party services (e.g. Gemini, OpenAI, Mistral, Claude, |
| 25 | +etc), we’ll call this the “egress use case”. We find that in both the model |
| 26 | +serving and egress use cases users want to be able to add more advanced filters, |
| 27 | +policies and other plugins that control or modify inference requests. |
| 28 | + |
| 29 | +However, there are many features we haven’t fully explored yet that seem to be |
| 30 | +cleanly addable at the HTTPRoute level via filters or policies. Perhaps some |
| 31 | +would even be applicable at the Gateway level. For example, it is conceivable |
| 32 | +you might add a “semantic routing” at the HTTPRoute level as a filter to |
| 33 | +determine which model to route to before the “routing/scheduling” layer. Or |
| 34 | +perhaps you need a policy to rate-limit token usage for requests (maybe this |
| 35 | +could even apply at the Gateway level). For the purposes of this charter, |
| 36 | +we’ll refer to features at this level as “AI Gateway” features. |
| 37 | + |
| 38 | +## Scope |
| 39 | + |
| 40 | +The scope of this WG is to define terms like "AI Gateway" in the context of |
| 41 | +Kubernetes and propose deliverables that need to be adopted in order to **manage |
| 42 | +AI traffic** on Kubernetes, such as: |
| 43 | + |
| 44 | +* **Prompt Guards** - Define and enforce content safety rules for inference |
| 45 | + content to detect and block sensitive or malicious prompts. |
| 46 | +* **Token Rate Limiting** - enforce rate limiting rules based on token usage to |
| 47 | + control usage and cost. |
| 48 | +* **Semantic Routing** - making a routing decision for an inference request |
| 49 | + based on semantic similarity of the request body. |
| 50 | +* **Semantic Caching** - Provide caching for inference response based on the |
| 51 | + semantic similarity of prompts. |
| 52 | +* **Response Risk** - Define and enforce content safety rules with inference |
| 53 | + response content to detect and block sensitive responses from generative AI |
| 54 | + models. |
| 55 | +* **Failure Modes** - How inference routing failures should be handled, what |
| 56 | + failure modes we think are important to cover. For instance this may |
| 57 | + encapsulate fallback and retry policies. |
| 58 | +* **Observability** - What standards for metrics and tracing for “AI Gateway” |
| 59 | + features should be standardized, and how? |
| 60 | + |
| 61 | +> **Note**: The above list of features should be considered an example, and |
| 62 | +> non-exhaustive. We may not act on all of these, but the purpose is more to |
| 63 | +> illustrate the kind of features we will be exploring. |
| 64 | +
|
| 65 | +### In Scope |
| 66 | + |
| 67 | +Overall guidance for the WG is to control scope as much as is feasible. The WG |
| 68 | +should avoid AI-specific functionality where it can: instead favoring the |
| 69 | +addition of provisions that help with AI networking and traffic management. In |
| 70 | +particular, the following is in scope: |
| 71 | + |
| 72 | +* Providing definitions for networking related AI terms in a Kubernetes |
| 73 | + context, such as "AI Gateway". |
| 74 | + |
| 75 | +* Defining important use-cases for Kubernetes users. |
| 76 | + |
| 77 | +* Determining which common features and capabilities in the "AI Gateway" space |
| 78 | + need to be covered by Kubernetes standards and APIs according to user and |
| 79 | + implementation needs. |
| 80 | + |
| 81 | +* Creating proposals for "AI Gateway" features and capabilities to the |
| 82 | + appropriate sub-projects. |
| 83 | + |
| 84 | +* Propose new sub-projects if existing sub-projects are not sufficient. |
| 85 | + |
| 86 | +### Out of Scope |
| 87 | + |
| 88 | +* Developing whole "AI Gateway" solutions. This group will focus on enabling |
| 89 | + existing and new solutions to be more easily deployed and managed on |
| 90 | + Kubernetes, not creating any new Gateways. |
| 91 | + |
| 92 | +* Any specific kind of hardware support is generally out of scope. |
| 93 | + |
| 94 | +* This group will not cover the entire spectrum of networking for AI. For |
| 95 | + instance: RDMA networks are generally out of scope. |
| 96 | + |
| 97 | +* Model serving, and AI workloads are out of scope (see below for a caveat about |
| 98 | + this). |
| 99 | + |
| 100 | +### Additional Scope Distinctions |
| 101 | + |
| 102 | +There is a subtle distinction to be made when it comes to the scope of this WG |
| 103 | +for load-balancing and routing inference, particular when dealing with inference |
| 104 | +_workloads_: When the use case includes local model serving on the cluster, and |
| 105 | +routing and load-balancing features _rely on information from the inference |
| 106 | +workloads_, this kind of routing falls under the scope of WG Serving. |
| 107 | + |
| 108 | +A good example of this is the [Gateway API Inference Extension (GIE)][gie]. |
| 109 | +This project came from WG Serving and specifically handles advanced routing and |
| 110 | +load-balancing for inference which is informed by metrics and capabilities being |
| 111 | +advertised by the model serving platform (e.g. VLLM). In this vein, the GIE is |
| 112 | +effectively an alternative to the Kubernetes `Service` API, whereas this WG |
| 113 | +means to operate more at the `Gateway` and `HTTPRoute` level. |
| 114 | + |
| 115 | +Use cases which have to interact with the model serving layer for networking |
| 116 | +(as described above) are generally out of scope for this WG. If some feature |
| 117 | +the WG is working on absolutely must cross this line, the effort MUST be brought |
| 118 | +to WG Serving and worked on as a joint effort with them. |
| 119 | + |
| 120 | +[gie]:https://github.com/kubernetes-sigs/gateway-api-inference-extension |
| 121 | + |
| 122 | +## Deliverables |
| 123 | + |
| 124 | +* A compendium of AI related networking definitions (e.g. "AI Gateway") and a |
| 125 | + key use-cases for Kubernetes users. |
| 126 | + |
| 127 | +* Provide a space for collaboration and experimentation to determine the most |
| 128 | + viable features and capabilities that Kubernetes should support. If there is |
| 129 | + strong consensus on any particular ideas, the WG will facilitate and |
| 130 | + coordinate the delivery of proposals in the appropriate areas. |
| 131 | + |
| 132 | +## Stakeholders |
| 133 | + |
| 134 | +* SIG Network |
| 135 | + |
| 136 | +### Related WGs |
| 137 | + |
| 138 | +* WG Serving - The domain of WG Serving is AI Workloads, which can be served by |
| 139 | + some of the networking support we want to add. When we have proposals that |
| 140 | + are strongly relevant to serving, we will loop them in so they can provide |
| 141 | + feedback. |
| 142 | + |
| 143 | +## Roles and Organization Management |
| 144 | + |
| 145 | +This working group adheres to the Roles and Organization Management outlined in |
| 146 | +[wg-governance] and opts-in to updates and modifications to [wg-governance]. |
| 147 | + |
| 148 | +[wg-governance]:https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md |
| 149 | + |
| 150 | +## Exit Criteria |
| 151 | + |
| 152 | +The WG is done when its deliverables are complete, according to the defined |
| 153 | +scope and a list of key use cases and features agreed upon by the group. |
| 154 | + |
| 155 | +Ideally we want the lifecycle of the WG to go something like this: |
| 156 | + |
| 157 | +1. Determine definitions and key use cases for Kubernetes users and |
| 158 | + implementations, and document those. |
| 159 | +2. Determine a list of key features that Kubernetes needs to best support the |
| 160 | + defined use cases. |
| 161 | +3. For each feature in that list, make proposals which support them to the |
| 162 | + appropriate sub-projects OR propose new sub-projects if deemed necessary. |
| 163 | +4. Once the feature list is complete, leave behind some guidance and best |
| 164 | + practices for future implementations and then exit. |
0 commit comments