docs: fix example in examples/inference-pool (envoyproxy#1488)

googs1025 · web-flow · commit 73aa427fd009 · 2025-11-05T11:46:59.000Z
**Description** fix: envoyproxy#1485 **Related Issues/PRs (if applicable)** **Special notes for reviewers (if applicable)** Signed-off-by: googs1025 <googs1025@gmail.com>
diff --git a/examples/inference-pool/README.md b/examples/inference-pool/README.md
@@ -2,10 +2,19 @@
 
 This example demonstrates how to use AI Gateway with the InferencePool feature, which enables intelligent request routing across multiple inference endpoints with load balancing and health checking capabilities.
 
+The setup includes **three distinct backends**:
+
+- Two `InferencePool` resources for LLMs (`Llama-3.1-8B-Instruct` and `Mistral`)
+- One standard `Backend` for non-InferencePool traffic
+
+Routing is controlled by the `x-ai-eg-model` HTTP header.
+
 ## Files in This Directory
 
 - **`envoy-gateway-values-addon.yaml`**: Envoy Gateway values addon for InferencePool support. Combine with `../../manifests/envoy-gateway-values.yaml`.
-- **`base.yaml`**: Complete example that includes Gateway, AIServiceBackend, InferencePool CRDs, and a sample application deployment.
+- **`base.yaml`**: Deploys all inference backends and supporting resources using the **standard approach documented in the official guide**. This includes:
+  - A `mistral` backend with custom Endpoint Picker configuration
+  - A standard fallback backend (`envoy-ai-gateway-basic-testupstream`) for non-InferencePool routing
 - **`aigwroute.yaml`**: Example AIGatewayRoute that uses InferencePool as a backend.
 - **`httproute.yaml`**: Example HTTPRoute for traditional HTTP routing to InferencePool endpoints.
 - **`with-annotations.yaml`**: Advanced example showing InferencePool with Kubernetes annotations for fine-grained control.
@@ -27,16 +36,63 @@ This example demonstrates how to use AI Gateway with the InferencePool feature,
 
    ```bash
    kubectl apply -f base.yaml
+   kubectl apply -f aigwroute.yaml
    ```
 
+   > Note: The `aigwroute.yaml` file defines the InferencePool and routing logic, but does not deploy the actual inference backend (e.g., the vLLM server for Llama-3.1-8B-Instruct).
+   > You must deploy the backend separately by following [Step 3: Deploy Inference Backends](https://aigateway.envoyproxy.io/docs/capabilities/inference/aigatewayroute-inferencepool#step-3-deploy-inference-backends)
+
 3. Test the setup:
 
-   ```bash
-   GATEWAY_HOST=$(kubectl get gateway/ai-gateway -o jsonpath='{.status.addresses[0].value}')
-   curl -X POST "http://${GATEWAY_HOST}/v1/chat/completions" \
-     -H "Content-Type: application/json" \
-     -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}]}'
-   ```
+You can access the gateway in two ways, depending on your environment.
+
+✅ Option A: Using External IP (e.g., cloud LoadBalancer, MetalLB)
+If your cluster assigns an external address to the Gateway:
+
+```bash
+GATEWAY_HOST=$(kubectl get gateway/inference-pool-with-aigwroute -n default -o jsonpath='{.status.addresses[0].value}')
+echo "Gateway available at: http://${GATEWAY_HOST}"
+```
+
+Then send a request:
+
+```bash
+curl -X POST "http://${GATEWAY_HOST}/v1/chat/completions" \
+  -H "x-ai-eg-model: meta-llama/Llama-3.1-8B-Instruct" \
+  -H "Authorization: sk-abcdefghijklmnopqrstuvwxyz" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'
+```
+
+✅ Option B: Using kubectl port-forward (ideal for local clusters like Minikube/Kind)
+In one terminal, forward the gateway service:
+
+```bash
+kubectl port-forward svc/envoy-default-inference-pool-with-aigwroute-d416582c 8080:80 -n envoy-gateway-system
+```
+
+In another terminal, send requests to localhost:8080:
+
+```bash
+# Route to Llama (InferencePool)
+curl -X POST "http://localhost:8080/v1/chat/completions" \
+  -H "x-ai-eg-model: meta-llama/Llama-3.1-8B-Instruct" \
+  -H "Authorization: sk-abcdefghijklmnopqrstuvwxyz" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'
+
+# Route to Mistral (InferencePool)
+curl -X POST "http://localhost:8080/v1/chat/completions" \
+  -H "x-ai-eg-model: mistral:latest" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "mistral:latest", "messages": [{"role": "user", "content": "Hello!"}]}'
+
+# Route to fallback backend (Standard Backend)
+curl -X POST "http://localhost:8080/v1/chat/completions" \
+  -H "x-ai-eg-model: some-cool-self-hosted-model" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "some-cool-self-hosted-model", "messages": [{"role": "user", "content": "Hello!"}]}'
+```
 
 ### Combining with Other Features