Skip to content

Add agentgateway as implementation #1321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ For deeper insights and more advanced concepts, refer to our [proposals](/docs/p

## Technical Overview

This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kgateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.

The Inference Gateway:

Expand Down
10 changes: 10 additions & 0 deletions config/manifests/gateway/agentgateway/gateway.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: inference-gateway
spec:
gatewayClassName: agentgateway
listeners:
- name: http
port: 80
protocol: HTTP
20 changes: 20 additions & 0 deletions config/manifests/gateway/agentgateway/httproute.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
- backendRefs:
- group: inference.networking.x-k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct
matches:
- path:
type: PathPrefix
value: /
timeouts:
request: 300s
11 changes: 11 additions & 0 deletions conformance/reports/v0.5.1/gateway/agentgateway/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Agent Gateway (with kgateway)

## Table of Contents

| Extension Version Tested | Profile Tested | Implementation Version | Mode | Report |
|--------------------------|----------------|------------------------|---------|----------------------------------------------------------------------------|
| v0.5.1 | Gateway | v0.7.2 | default | [v0.7.2 report](./inference-v0.7.2-report.yaml) |

## Reproduce

From the [kgateway repository](https://github.com/kgateway-dev/kgateway/): `CONFORMANCE_GATEWAY_CLASS=agentgateway make gie-conformance`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
GatewayAPIInferenceExtensionVersion: v0.5.1
apiVersion: gateway.networking.k8s.io/v1
date: "2025-08-06T17:50:20-07:00"
gatewayAPIChannel: experimental
gatewayAPIVersion: v1.3.0
implementation:
contact:
- github.com/agentgateway/agentgateway/issues/new/choose
organization: agentgateway
project: agentgateway
url: http://agentgateway.dev/
version: v0.7.2
kind: ConformanceReport
mode: default
profiles:
- core:
result: success
statistics:
Failed: 0
Passed: 9
Skipped: 0
name: Gateway
summary: Core tests succeeded.
2 changes: 1 addition & 1 deletion site-src/guides/implementers.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ Supporting this broad range of extension capabilities (including for inference,
Several implementations can be used as references:

- A fully featured [reference implementation](https://github.com/envoyproxy/envoy/tree/main/source/extensions/filters/http/ext_proc) (C++) can be found in the Envoy GitHub repository.
- A second implementation (Rust, non-Envoy) is available in [Agent Gateway](https://github.com/agentgateway/agentgateway/blob/v0.5.2/crates/proxy/src/ext_proc.rs).
- A second implementation (Rust, non-Envoy) is available in [agentgateway](https://github.com/agentgateway/agentgateway/blob/v0.7.2/crates/agentgateway/src/http/ext_proc.rs).

#### Portable Implementation

Expand Down
46 changes: 46 additions & 0 deletions site-src/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,52 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
```bash
kubectl get httproute llm-route -o yaml
```
=== "Agentgateway"

[Agentgateway](https://agentgateway.dev/) is a purpose-built proxy designed for AI workloads, and comes with native support for inference routing. Agentgateway integrates with [Kgateway](https://kgateway.dev/) as it's control plane.

1. Requirements

- [Helm](https://helm.sh/docs/intro/install/) installed.
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.

2. Set the Kgateway version and install the Kgateway CRDs.

```bash
KGTW_VERSION=v2.0.4
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
```

3. Install Kgateway

```bash
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true --set agentGateway.enabled=true
```

4. Deploy the Gateway

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/gateway.yaml
```

Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
```bash
$ kubectl get gateway inference-gateway
NAME CLASS ADDRESS PROGRAMMED AGE
inference-gateway agentgateway <MY_ADDRESS> True 22s
```

5. Deploy the HTTPRoute

```bash
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/httproute.yaml
```

6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:

```bash
kubectl get httproute llm-route -o yaml
```

### Try it out

Expand Down
83 changes: 46 additions & 37 deletions site-src/implementations/gateways.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,45 @@

This project has several implementations that are planned or in progress:

* [Envoy AI Gateway][1]
* [Kgateway][2]
* [Google Kubernetes Engine][3]
* [Istio][4]
* [Alibaba Cloud Container Service for Kubernetes][5]

[1]:#envoy-gateway
[2]:#kgateway
[3]:#google-kubernetes-engine
[4]:#istio
[5]:#alibaba-cloud-container-service-for-kubernetes
* [Agentgateway][1]
* [Alibaba Cloud Container Service for Kubernetes][2]
* [Envoy AI Gateway][3]
* [Google Kubernetes Engine][4]
* [Istio][5]
* [Kgateway][6]

[1]:#agentgateway
[2]:#alibaba-cloud-container-service-for-kubernetes
[3]:#envoy-gateway
[4]:#google-kubernetes-engine
[5]:#istio
[6]:#kgateway

## Agentgateway

[Agentgateway](https://agentgateway.dev/) is an open source Gateway API implementation focusing on AI use cases, including LLM consumption, LLM serving, agent-to-agent ([A2A](https://a2aproject.github.io/A2A/latest/)), and agent-to-tool ([MCP](https://modelcontextprotocol.io/introduction)). It is the first and only proxy designed specifically for the Kubernetes Gateway API, powered by a high performance and scalable Rust dataplane implementation.

Agentgateway comes with native support for Gateway API Inference Extension, powered by the [Kgateway](https://kgateway.dev/) control plane.

## Alibaba Cloud Container Service for Kubernetes

[Alibaba Cloud Container Service for Kubernetes (ACK)][ack] is a managed Kubernetes platform
offered by Alibaba Cloud. The implementation of the Gateway API in ACK is through the
[ACK Gateway with Inference Extension][ack-gie] component, which introduces model-aware,
GPU-efficient load balancing for AI workloads beyond basic HTTP routing.

The ACK Gateway with Inference Extension implements the Gateway API Inference Extension
and provides optimized routing for serving generative AI workloads,
including weighted traffic splitting, mirroring, advanced routing, etc.
See the docs for the [usage][ack-gie-usage].

Progress towards supporting Gateway API Inference Extension is being tracked
by [this Issue](https://github.com/AliyunContainerService/ack-gateway-api/issues/1).

[ack]:https://www.alibabacloud.com/help/en/ack
[ack-gie]:https://www.alibabacloud.com/help/en/ack/product-overview/ack-gateway-with-inference-extension
[ack-gie-usage]:https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/intelligent-routing-and-traffic-management-with-ack-gateway-inference-extension


## Envoy AI Gateway

Expand All @@ -29,16 +57,6 @@ Issue](https://github.com/envoyproxy/ai-gateway/issues/423).
[aigw-capabilities]:https://aigateway.envoyproxy.io/docs/capabilities/
[aigw-quickstart]:https://aigateway.envoyproxy.io/docs/capabilities/gateway-api-inference-extension

## Kgateway

[Kgateway](https://kgateway.dev/) is a feature-rich, Kubernetes-native
ingress controller and next-generation API gateway. Kgateway brings the
full power and community support of Gateway API to its existing control-plane
implementation.

Progress towards supporting this project is tracked with a [GitHub
Issue](https://github.com/kgateway-dev/kgateway/issues/10411).

## Google Kubernetes Engine

[Google Kubernetes Engine (GKE)][gke] is a managed Kubernetes platform offered
Expand Down Expand Up @@ -68,21 +86,12 @@ For service mesh users, Istio also fully supports east-west (including [GAMMA](h
Gateway API Inference Extension support is being tracked by this [GitHub
Issue](https://github.com/istio/istio/issues/55768).

## Alibaba Cloud Container Service for Kubernetes

[Alibaba Cloud Container Service for Kubernetes (ACK)][ack] is a managed Kubernetes platform
offered by Alibaba Cloud. The implementation of the Gateway API in ACK is through the
[ACK Gateway with Inference Extension][ack-gie] component, which introduces model-aware,
GPU-efficient load balancing for AI workloads beyond basic HTTP routing.

The ACK Gateway with Inference Extension implements the Gateway API Inference Extension
and provides optimized routing for serving generative AI workloads,
including weighted traffic splitting, mirroring, advanced routing, etc.
See the docs for the [usage][ack-gie-usage].
## Kgateway

Progress towards supporting Gateway API Inference Extension is being tracked
by [this Issue](https://github.com/AliyunContainerService/ack-gateway-api/issues/1).
[Kgateway](https://kgateway.dev/) is a feature-rich, Kubernetes-native
ingress controller and next-generation API gateway. Kgateway brings the
full power and community support of Gateway API to its existing control-plane
implementation.

[ack]:https://www.alibabacloud.com/help/en/ack
[ack-gie]:https://www.alibabacloud.com/help/en/ack/product-overview/ack-gateway-with-inference-extension
[ack-gie-usage]:https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/intelligent-routing-and-traffic-management-with-ack-gateway-inference-extension
Progress towards supporting this project is tracked with a [GitHub
Issue](https://github.com/kgateway-dev/kgateway/issues/10411).