You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add agentgateway as implementation
* Add conformance report passing all tests with v0.5.1
* Add to implementation list. In order to avoid picking where in the
list to add it, I sorted the list alphabetically mirroring GW API
* Update a reference to our ext_proc server implementation. I don't care
to update this for every release or anything, but the old link was to
a very very rough implementation that has since had many fixes.
* address comments
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,7 +60,7 @@ For deeper insights and more advanced concepts, refer to our [proposals](/docs/p
60
60
61
61
## Technical Overview
62
62
63
-
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
63
+
This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kgateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee.
Copy file name to clipboardExpand all lines: site-src/guides/implementers.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -141,7 +141,7 @@ Supporting this broad range of extension capabilities (including for inference,
141
141
Several implementations can be used as references:
142
142
143
143
- A fully featured [reference implementation](https://github.com/envoyproxy/envoy/tree/main/source/extensions/filters/http/ext_proc) (C++) can be found in the Envoy GitHub repository.
144
-
- A second implementation (Rust, non-Envoy) is available in [Agent Gateway](https://github.com/agentgateway/agentgateway/blob/v0.5.2/crates/proxy/src/ext_proc.rs).
144
+
- A second implementation (Rust, non-Envoy) is available in [agentgateway](https://github.com/agentgateway/agentgateway/blob/v0.7.2/crates/agentgateway/src/http/ext_proc.rs).
Copy file name to clipboardExpand all lines: site-src/guides/index.md
+69Lines changed: 69 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -244,6 +244,53 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
244
244
kubectl get httproute llm-route -o yaml
245
245
```
246
246
247
+
=== "Agentgateway"
248
+
249
+
[Agentgateway](https://agentgateway.dev/) is a purpose-built proxy designed for AI workloads, and comes with native support for inference routing. Agentgateway integrates with [Kgateway](https://kgateway.dev/) as it's control plane.
[Agentgateway](https://agentgateway.dev/) is an open source Gateway API implementation focusing on AI use cases, including LLM consumption, LLM serving, agent-to-agent ([A2A](https://a2aproject.github.io/A2A/latest/)), and agent-to-tool ([MCP](https://modelcontextprotocol.io/introduction)). It is the first and only proxy designed specifically for the Kubernetes Gateway API, powered by a high performance and scalable Rust dataplane implementation.
22
+
23
+
Agentgateway comes with native support for Gateway API Inference Extension, powered by the [Kgateway](https://kgateway.dev/) control plane.
24
+
25
+
## Alibaba Cloud Container Service for Kubernetes
26
+
27
+
[Alibaba Cloud Container Service for Kubernetes (ACK)][ack] is a managed Kubernetes platform
28
+
offered by Alibaba Cloud. The implementation of the Gateway API in ACK is through the
29
+
[ACK Gateway with Inference Extension][ack-gie] component, which introduces model-aware,
30
+
GPU-efficient load balancing for AI workloads beyond basic HTTP routing.
31
+
32
+
The ACK Gateway with Inference Extension implements the Gateway API Inference Extension
33
+
and provides optimized routing for serving generative AI workloads,
34
+
including weighted traffic splitting, mirroring, advanced routing, etc.
35
+
See the docs for the [usage][ack-gie-usage].
36
+
37
+
Progress towards supporting Gateway API Inference Extension is being tracked
38
+
by [this Issue](https://github.com/AliyunContainerService/ack-gateway-api/issues/1).
gateway that can run [independently](https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_3_3), as an [Istio waypoint](https://kgateway.dev/blog/extend-istio-ambient-kgateway-waypoint/),
37
-
or within your [llm-d infrastructure](https://github.com/llm-d-incubation/llm-d-infra) to improve accelerator (GPU)
38
-
utilization for AI inference workloads.
39
-
40
60
## Google Kubernetes Engine
41
61
42
62
[Google Kubernetes Engine (GKE)][gke] is a managed Kubernetes platform offered
@@ -66,21 +86,10 @@ For service mesh users, Istio also fully supports east-west (including [GAMMA](h
66
86
Gateway API Inference Extension support is being tracked by this [GitHub
gateway that can run [independently](https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_3_3), as an [Istio waypoint](https://kgateway.dev/blog/extend-istio-ambient-kgateway-waypoint/),
94
+
or within your [llm-d infrastructure](https://github.com/llm-d-incubation/llm-d-infra) to improve accelerator (GPU)
0 commit comments