diff --git a/README.md b/README.md index 8212c7301..51aaf2829 100644 --- a/README.md +++ b/README.md @@ -60,7 +60,7 @@ For deeper insights and more advanced concepts, refer to our [proposals](/docs/p ## Technical Overview -This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kGateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee. +This extension upgrades an [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) capable proxy or gateway - such as Envoy Gateway, kgateway, or the GKE Gateway - to become an **[inference gateway]** - supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes. This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat) to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers in a higher level **AI Gateway** like LiteLLM, Solo AI Gateway, or Apigee. The Inference Gateway: diff --git a/config/manifests/gateway/agentgateway/gateway.yaml b/config/manifests/gateway/agentgateway/gateway.yaml new file mode 100644 index 000000000..9407cc733 --- /dev/null +++ b/config/manifests/gateway/agentgateway/gateway.yaml @@ -0,0 +1,10 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: inference-gateway +spec: + gatewayClassName: agentgateway + listeners: + - name: http + port: 80 + protocol: HTTP diff --git a/config/manifests/gateway/agentgateway/httproute.yaml b/config/manifests/gateway/agentgateway/httproute.yaml new file mode 100644 index 000000000..18e90ced6 --- /dev/null +++ b/config/manifests/gateway/agentgateway/httproute.yaml @@ -0,0 +1,20 @@ +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: llm-route +spec: + parentRefs: + - group: gateway.networking.k8s.io + kind: Gateway + name: inference-gateway + rules: + - backendRefs: + - group: inference.networking.x-k8s.io + kind: InferencePool + name: vllm-llama3-8b-instruct + matches: + - path: + type: PathPrefix + value: / + timeouts: + request: 300s diff --git a/conformance/reports/v0.5.1/gateway/agentgateway/README.md b/conformance/reports/v0.5.1/gateway/agentgateway/README.md new file mode 100644 index 000000000..6b6aa4b31 --- /dev/null +++ b/conformance/reports/v0.5.1/gateway/agentgateway/README.md @@ -0,0 +1,11 @@ +# Agent Gateway (with kgateway) + +## Table of Contents + +| Extension Version Tested | Profile Tested | Implementation Version | Mode | Report | +|--------------------------|----------------|------------------------|---------|----------------------------------------------------------------------------| +| v0.5.1 | Gateway | v0.7.2 | default | [v0.7.2 report](./inference-v0.7.2-report.yaml) | + +## Reproduce + +From the [kgateway repository](https://github.com/kgateway-dev/kgateway/): `CONFORMANCE_GATEWAY_CLASS=agentgateway make gie-conformance`. diff --git a/conformance/reports/v0.5.1/gateway/agentgateway/inference-v0.7.2-report.yaml b/conformance/reports/v0.5.1/gateway/agentgateway/inference-v0.7.2-report.yaml new file mode 100644 index 000000000..3e76f4311 --- /dev/null +++ b/conformance/reports/v0.5.1/gateway/agentgateway/inference-v0.7.2-report.yaml @@ -0,0 +1,23 @@ +GatewayAPIInferenceExtensionVersion: v0.5.1 +apiVersion: gateway.networking.k8s.io/v1 +date: "2025-08-06T17:50:20-07:00" +gatewayAPIChannel: experimental +gatewayAPIVersion: v1.3.0 +implementation: + contact: + - github.com/agentgateway/agentgateway/issues/new/choose + organization: agentgateway + project: agentgateway + url: http://agentgateway.dev/ + version: v0.7.2 +kind: ConformanceReport +mode: default +profiles: +- core: + result: success + statistics: + Failed: 0 + Passed: 9 + Skipped: 0 + name: Gateway + summary: Core tests succeeded. \ No newline at end of file diff --git a/site-src/guides/implementers.md b/site-src/guides/implementers.md index 4fb6ee7e4..747e934a2 100644 --- a/site-src/guides/implementers.md +++ b/site-src/guides/implementers.md @@ -141,7 +141,7 @@ Supporting this broad range of extension capabilities (including for inference, Several implementations can be used as references: - A fully featured [reference implementation](https://github.com/envoyproxy/envoy/tree/main/source/extensions/filters/http/ext_proc) (C++) can be found in the Envoy GitHub repository. -- A second implementation (Rust, non-Envoy) is available in [Agent Gateway](https://github.com/agentgateway/agentgateway/blob/v0.5.2/crates/proxy/src/ext_proc.rs). +- A second implementation (Rust, non-Envoy) is available in [agentgateway](https://github.com/agentgateway/agentgateway/blob/v0.7.2/crates/agentgateway/src/http/ext_proc.rs). #### Portable Implementation diff --git a/site-src/guides/index.md b/site-src/guides/index.md index 0cf64377a..57554944a 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -243,6 +243,52 @@ This quickstart guide is intended for engineers familiar with k8s and model serv ```bash kubectl get httproute llm-route -o yaml ``` +=== "Agentgateway" + + [Agentgateway](https://agentgateway.dev/) is a purpose-built proxy designed for AI workloads, and comes with native support for inference routing. Agentgateway integrates with [Kgateway](https://kgateway.dev/) as it's control plane. + + 1. Requirements + + - [Helm](https://helm.sh/docs/intro/install/) installed. + - Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed. + + 2. Set the Kgateway version and install the Kgateway CRDs. + + ```bash + KGTW_VERSION=v2.0.4 + helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds + ``` + + 3. Install Kgateway + + ```bash + helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true --set agentGateway.enabled=true + ``` + + 4. Deploy the Gateway + + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/gateway.yaml + ``` + + Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: + ```bash + $ kubectl get gateway inference-gateway + NAME CLASS ADDRESS PROGRAMMED AGE + inference-gateway agentgateway True 22s + ``` + + 5. Deploy the HTTPRoute + + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/agentgateway/httproute.yaml + ``` + + 6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`: + + ```bash + kubectl get httproute llm-route -o yaml + ``` ### Try it out diff --git a/site-src/implementations/gateways.md b/site-src/implementations/gateways.md index 950c0833e..d5d322d20 100644 --- a/site-src/implementations/gateways.md +++ b/site-src/implementations/gateways.md @@ -2,17 +2,45 @@ This project has several implementations that are planned or in progress: -* [Envoy AI Gateway][1] -* [Kgateway][2] -* [Google Kubernetes Engine][3] -* [Istio][4] -* [Alibaba Cloud Container Service for Kubernetes][5] - -[1]:#envoy-gateway -[2]:#kgateway -[3]:#google-kubernetes-engine -[4]:#istio -[5]:#alibaba-cloud-container-service-for-kubernetes +* [Agentgateway][1] +* [Alibaba Cloud Container Service for Kubernetes][2] +* [Envoy AI Gateway][3] +* [Google Kubernetes Engine][4] +* [Istio][5] +* [Kgateway][6] + +[1]:#agentgateway +[2]:#alibaba-cloud-container-service-for-kubernetes +[3]:#envoy-gateway +[4]:#google-kubernetes-engine +[5]:#istio +[6]:#kgateway + +## Agentgateway + +[Agentgateway](https://agentgateway.dev/) is an open source Gateway API implementation focusing on AI use cases, including LLM consumption, LLM serving, agent-to-agent ([A2A](https://a2aproject.github.io/A2A/latest/)), and agent-to-tool ([MCP](https://modelcontextprotocol.io/introduction)). It is the first and only proxy designed specifically for the Kubernetes Gateway API, powered by a high performance and scalable Rust dataplane implementation. + +Agentgateway comes with native support for Gateway API Inference Extension, powered by the [Kgateway](https://kgateway.dev/) control plane. + +## Alibaba Cloud Container Service for Kubernetes + +[Alibaba Cloud Container Service for Kubernetes (ACK)][ack] is a managed Kubernetes platform +offered by Alibaba Cloud. The implementation of the Gateway API in ACK is through the +[ACK Gateway with Inference Extension][ack-gie] component, which introduces model-aware, +GPU-efficient load balancing for AI workloads beyond basic HTTP routing. + +The ACK Gateway with Inference Extension implements the Gateway API Inference Extension +and provides optimized routing for serving generative AI workloads, +including weighted traffic splitting, mirroring, advanced routing, etc. +See the docs for the [usage][ack-gie-usage]. + +Progress towards supporting Gateway API Inference Extension is being tracked +by [this Issue](https://github.com/AliyunContainerService/ack-gateway-api/issues/1). + +[ack]:https://www.alibabacloud.com/help/en/ack +[ack-gie]:https://www.alibabacloud.com/help/en/ack/product-overview/ack-gateway-with-inference-extension +[ack-gie-usage]:https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/intelligent-routing-and-traffic-management-with-ack-gateway-inference-extension + ## Envoy AI Gateway @@ -29,16 +57,6 @@ Issue](https://github.com/envoyproxy/ai-gateway/issues/423). [aigw-capabilities]:https://aigateway.envoyproxy.io/docs/capabilities/ [aigw-quickstart]:https://aigateway.envoyproxy.io/docs/capabilities/gateway-api-inference-extension -## Kgateway - -[Kgateway](https://kgateway.dev/) is a feature-rich, Kubernetes-native -ingress controller and next-generation API gateway. Kgateway brings the -full power and community support of Gateway API to its existing control-plane -implementation. - -Progress towards supporting this project is tracked with a [GitHub -Issue](https://github.com/kgateway-dev/kgateway/issues/10411). - ## Google Kubernetes Engine [Google Kubernetes Engine (GKE)][gke] is a managed Kubernetes platform offered @@ -68,21 +86,12 @@ For service mesh users, Istio also fully supports east-west (including [GAMMA](h Gateway API Inference Extension support is being tracked by this [GitHub Issue](https://github.com/istio/istio/issues/55768). -## Alibaba Cloud Container Service for Kubernetes - -[Alibaba Cloud Container Service for Kubernetes (ACK)][ack] is a managed Kubernetes platform -offered by Alibaba Cloud. The implementation of the Gateway API in ACK is through the -[ACK Gateway with Inference Extension][ack-gie] component, which introduces model-aware, -GPU-efficient load balancing for AI workloads beyond basic HTTP routing. - -The ACK Gateway with Inference Extension implements the Gateway API Inference Extension -and provides optimized routing for serving generative AI workloads, -including weighted traffic splitting, mirroring, advanced routing, etc. -See the docs for the [usage][ack-gie-usage]. +## Kgateway -Progress towards supporting Gateway API Inference Extension is being tracked -by [this Issue](https://github.com/AliyunContainerService/ack-gateway-api/issues/1). +[Kgateway](https://kgateway.dev/) is a feature-rich, Kubernetes-native +ingress controller and next-generation API gateway. Kgateway brings the +full power and community support of Gateway API to its existing control-plane +implementation. -[ack]:https://www.alibabacloud.com/help/en/ack -[ack-gie]:https://www.alibabacloud.com/help/en/ack/product-overview/ack-gateway-with-inference-extension -[ack-gie-usage]:https://www.alibabacloud.com/help/en/ack/ack-managed-and-ack-dedicated/user-guide/intelligent-routing-and-traffic-management-with-ack-gateway-inference-extension \ No newline at end of file +Progress towards supporting this project is tracked with a [GitHub +Issue](https://github.com/kgateway-dev/kgateway/issues/10411).