Skip to content

Commit cc9d771

Browse files
authored
Add agentgateway as implementation (#1321)
* Add agentgateway as implementation * Add conformance report passing all tests with v0.5.1 * Add to implementation list. In order to avoid picking where in the list to add it, I sorted the list alphabetically mirroring GW API * Update a reference to our ext_proc server implementation. I don't care to update this for every release or anything, but the old link was to a very very rough implementation that has since had many fixes. * address comments
1 parent 4fa525d commit cc9d771

File tree

8 files changed

+180
-38
lines changed

8 files changed

+180
-38
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ For deeper insights and more advanced concepts, refer to our [proposals](/docs/p
6060

6161
## Technical Overview
6262

63-
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
63+
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kgateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
6464

6565
The Inference Gateway:
6666

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: gateway.networking.k8s.io/v1
2+
kind: Gateway
3+
metadata:
4+
name: inference-gateway
5+
spec:
6+
gatewayClassName: agentgateway
7+
listeners:
8+
- name: http
9+
port: 80
10+
protocol: HTTP
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
apiVersion: gateway.networking.k8s.io/v1
2+
kind: HTTPRoute
3+
metadata:
4+
name: llm-route
5+
spec:
6+
parentRefs:
7+
- group: gateway.networking.k8s.io
8+
kind: Gateway
9+
name: inference-gateway
10+
rules:
11+
- backendRefs:
12+
- group: inference.networking.x-k8s.io
13+
kind: InferencePool
14+
name: vllm-llama3-8b-instruct
15+
matches:
16+
- path:
17+
type: PathPrefix
18+
value: /
19+
timeouts:
20+
request: 300s
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Agentgateway (with kgateway)
2+
3+
## Table of Contents
4+
5+
| Extension Version Tested | Profile Tested | Implementation Version | Mode | Report |
6+
|--------------------------|----------------|------------------------|---------|----------------------------------------------------------------------------|
7+
| v0.5.1 | Gateway | v0.7.2 | default | [v0.7.2 report](./inference-v0.7.2-report.yaml) |
8+
9+
## Reproduce
10+
11+
From the [kgateway repository](https://github.com/kgateway-dev/kgateway/): `CONFORMANCE_GATEWAY_CLASS=agentgateway make gie-conformance`.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
GatewayAPIInferenceExtensionVersion: v0.5.1
2+
apiVersion: gateway.networking.k8s.io/v1
3+
date: "2025-08-06T17:50:20-07:00"
4+
gatewayAPIChannel: experimental
5+
gatewayAPIVersion: v1.3.0
6+
implementation:
7+
contact:
8+
- github.com/agentgateway/agentgateway/issues/new/choose
9+
organization: agentgateway
10+
project: agentgateway
11+
url: http://agentgateway.dev/
12+
version: v0.7.2
13+
kind: ConformanceReport
14+
mode: default
15+
profiles:
16+
- core:
17+
result: success
18+
statistics:
19+
Failed: 0
20+
Passed: 9
21+
Skipped: 0
22+
name: Gateway
23+
summary: Core tests succeeded.

site-src/guides/implementers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ Supporting this broad range of extension capabilities (including for inference,
141141
Several implementations can be used as references:
142142

143143
- A fully featured [reference implementation](https://github.com/envoyproxy/envoy/tree/main/source/extensions/filters/http/ext_proc) (C++) can be found in the Envoy GitHub repository.
144-
- A second implementation (Rust, non-Envoy) is available in [Agent Gateway](https://github.com/agentgateway/agentgateway/blob/v0.5.2/crates/proxy/src/ext_proc.rs).
144+
- A second implementation (Rust, non-Envoy) is available in [agentgateway](https://github.com/agentgateway/agentgateway/blob/v0.7.2/crates/agentgateway/src/http/ext_proc.rs).
145145

146146
#### Portable Implementation
147147

site-src/guides/index.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,53 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
244244
kubectl get httproute llm-route -o yaml
245245
```
246246

247+
=== "Agentgateway"
248+
249+
[Agentgateway](https://agentgateway.dev/) is a purpose-built proxy designed for AI workloads, and comes with native support for inference routing. Agentgateway integrates with [Kgateway](https://kgateway.dev/) as it's control plane.
250+
251+
1. Requirements
252+
253+
- [Helm](https://helm.sh/docs/intro/install/) installed.
254+
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.
255+
256+
2. Set the Kgateway version and install the Kgateway CRDs.
257+
258+
```bash
259+
KGTW_VERSION=v2.0.4
260+
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
261+
```
262+
263+
3. Install Kgateway
264+
265+
```bash
266+
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true --set agentGateway.enabled=true
267+
```
268+
269+
4. Deploy the Gateway
270+
271+
```bash
272+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/gateway.yaml
273+
```
274+
275+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
276+
```bash
277+
$ kubectl get gateway inference-gateway
278+
NAME CLASS ADDRESS PROGRAMMED AGE
279+
inference-gateway agentgateway <MY_ADDRESS> True 22s
280+
```
281+
282+
5. Deploy the HTTPRoute
283+
284+
```bash
285+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/httproute.yaml
286+
```
287+
288+
6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
289+
290+
```bash
291+
kubectl get httproute llm-route -o yaml
292+
```
293+
247294
### Try it out
248295

249296
Wait until the gateway is ready.
@@ -339,3 +386,25 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
339386
```bash
340387
kubectl delete ns kgateway-system
341388
```
389+
390+
=== "Agentgateway"
391+
392+
The following instructions assume you would like to cleanup ALL Kgateway resources that were created in this quickstart guide.
393+
394+
1. Uninstall Kgateway
395+
396+
```bash
397+
helm uninstall kgateway -n kgateway-system
398+
```
399+
400+
1. Uninstall the Kgateway CRDs.
401+
402+
```bash
403+
helm uninstall kgateway-crds -n kgateway-system
404+
```
405+
406+
1. Remove the Kgateway namespace.
407+
408+
```bash
409+
kubectl delete ns kgateway-system
410+
```

site-src/implementations/gateways.md

Lines changed: 45 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,45 @@
22

33
This project has several implementations that are planned or in progress:
44

5-
* [Envoy AI Gateway][1]
6-
* [Kgateway][2]
7-
* [Google Kubernetes Engine][3]
8-
* [Istio][4]
9-
* [Alibaba Cloud Container Service for Kubernetes][5]
10-
11-
[1]:#envoy-gateway
12-
[2]:#kgateway
13-
[3]:#google-kubernetes-engine
14-
[4]:#istio
15-
[5]:#alibaba-cloud-container-service-for-kubernetes
5+
* [Agentgateway][1]
6+
* [Alibaba Cloud Container Service for Kubernetes][2]
7+
* [Envoy AI Gateway][3]
8+
* [Google Kubernetes Engine][4]
9+
* [Istio][5]
10+
* [Kgateway][6]
11+
12+
[1]:#agentgateway
13+
[2]:#alibaba-cloud-container-service-for-kubernetes
14+
[3]:#envoy-ai-gateway
15+
[4]:#google-kubernetes-engine
16+
[5]:#istio
17+
[6]:#kgateway
18+
19+
## Agentgateway
20+
21+
[Agentgateway](https://agentgateway.dev/) is an open source Gateway API implementation focusing on AI use cases, including LLM consumption, LLM serving, agent-to-agent ([A2A](https://a2aproject.github.io/A2A/latest/)), and agent-to-tool ([MCP](https://modelcontextprotocol.io/introduction)). It is the first and only proxy designed specifically for the Kubernetes Gateway API, powered by a high performance and scalable Rust dataplane implementation.
22+
23+
Agentgateway comes with native support for Gateway API Inference Extension, powered by the [Kgateway](https://kgateway.dev/) control plane.
24+
25+
## Alibaba Cloud Container Service for Kubernetes
26+
27+
[Alibaba Cloud Container Service for Kubernetes (ACK)][ack] is a managed Kubernetes platform
28+
offered by Alibaba Cloud. The implementation of the Gateway API in ACK is through the
29+
[ACK Gateway with Inference Extension][ack-gie] component, which introduces model-aware,
30+
GPU-efficient load balancing for AI workloads beyond basic HTTP routing.
31+
32+
The ACK Gateway with Inference Extension implements the Gateway API Inference Extension
33+
and provides optimized routing for serving generative AI workloads,
34+
including weighted traffic splitting, mirroring, advanced routing, etc.
35+
See the docs for the [usage][ack-gie-usage].
36+
37+
Progress towards supporting Gateway API Inference Extension is being tracked
38+
by [this Issue](https://github.com/AliyunContainerService/ack-gateway-api/issues/1).
39+
40+
[ack]:https://www.alibabacloud.com/help/en/ack
41+
[ack-gie]:https://www.alibabacloud.com/help/en/ack/product-overview/ack-gateway-with-inference-extension
42+
[ack-gie-usage]:https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/intelligent-routing-and-traffic-management-with-ack-gateway-inference-extension
43+
1644

1745
## Envoy AI Gateway
1846

@@ -29,14 +57,6 @@ Issue](https://github.com/envoyproxy/ai-gateway/issues/423).
2957
[aigw-capabilities]:https://aigateway.envoyproxy.io/docs/capabilities/
3058
[aigw-quickstart]:https://aigateway.envoyproxy.io/docs/capabilities/gateway-api-inference-extension
3159

32-
## Kgateway
33-
34-
[Kgateway](https://kgateway.dev/) is a Gateway API Inference Extension
35-
[conformant](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/conformance/reports/v0.5.1/gateway/kgateway)
36-
gateway that can run [independently](https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_3_3), as an [Istio waypoint](https://kgateway.dev/blog/extend-istio-ambient-kgateway-waypoint/),
37-
or within your [llm-d infrastructure](https://github.com/llm-d-incubation/llm-d-infra) to improve accelerator (GPU)
38-
utilization for AI inference workloads.
39-
4060
## Google Kubernetes Engine
4161

4262
[Google Kubernetes Engine (GKE)][gke] is a managed Kubernetes platform offered
@@ -66,21 +86,10 @@ For service mesh users, Istio also fully supports east-west (including [GAMMA](h
6686
Gateway API Inference Extension support is being tracked by this [GitHub
6787
Issue](https://github.com/istio/istio/issues/55768).
6888

69-
## Alibaba Cloud Container Service for Kubernetes
70-
71-
[Alibaba Cloud Container Service for Kubernetes (ACK)][ack] is a managed Kubernetes platform
72-
offered by Alibaba Cloud. The implementation of the Gateway API in ACK is through the
73-
[ACK Gateway with Inference Extension][ack-gie] component, which introduces model-aware,
74-
GPU-efficient load balancing for AI workloads beyond basic HTTP routing.
75-
76-
The ACK Gateway with Inference Extension implements the Gateway API Inference Extension
77-
and provides optimized routing for serving generative AI workloads,
78-
including weighted traffic splitting, mirroring, advanced routing, etc.
79-
See the docs for the [usage][ack-gie-usage].
80-
81-
Progress towards supporting Gateway API Inference Extension is being tracked
82-
by [this Issue](https://github.com/AliyunContainerService/ack-gateway-api/issues/1).
89+
## Kgateway
8390

84-
[ack]:https://www.alibabacloud.com/help/en/ack
85-
[ack-gie]:https://www.alibabacloud.com/help/en/ack/product-overview/ack-gateway-with-inference-extension
86-
[ack-gie-usage]:https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/intelligent-routing-and-traffic-management-with-ack-gateway-inference-extension
91+
[Kgateway](https://kgateway.dev/) is a Gateway API Inference Extension
92+
[conformant](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/conformance/reports/v0.5.1/gateway/kgateway)
93+
gateway that can run [independently](https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_3_3), as an [Istio waypoint](https://kgateway.dev/blog/extend-istio-ambient-kgateway-waypoint/),
94+
or within your [llm-d infrastructure](https://github.com/llm-d-incubation/llm-d-infra) to improve accelerator (GPU)
95+
utilization for AI inference workloads.

0 commit comments

Comments
 (0)