Skip to content

Commit 73aa427

Browse files
authored
docs: fix example in examples/inference-pool (envoyproxy#1488)
**Description** fix: envoyproxy#1485 **Related Issues/PRs (if applicable)** **Special notes for reviewers (if applicable)** Signed-off-by: googs1025 <[email protected]>
1 parent 7d09b57 commit 73aa427

File tree

1 file changed

+63
-7
lines changed

1 file changed

+63
-7
lines changed

examples/inference-pool/README.md

Lines changed: 63 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,19 @@
22

33
This example demonstrates how to use AI Gateway with the InferencePool feature, which enables intelligent request routing across multiple inference endpoints with load balancing and health checking capabilities.
44

5+
The setup includes **three distinct backends**:
6+
7+
- Two `InferencePool` resources for LLMs (`Llama-3.1-8B-Instruct` and `Mistral`)
8+
- One standard `Backend` for non-InferencePool traffic
9+
10+
Routing is controlled by the `x-ai-eg-model` HTTP header.
11+
512
## Files in This Directory
613

714
- **`envoy-gateway-values-addon.yaml`**: Envoy Gateway values addon for InferencePool support. Combine with `../../manifests/envoy-gateway-values.yaml`.
8-
- **`base.yaml`**: Complete example that includes Gateway, AIServiceBackend, InferencePool CRDs, and a sample application deployment.
15+
- **`base.yaml`**: Deploys all inference backends and supporting resources using the **standard approach documented in the official guide**. This includes:
16+
- A `mistral` backend with custom Endpoint Picker configuration
17+
- A standard fallback backend (`envoy-ai-gateway-basic-testupstream`) for non-InferencePool routing
918
- **`aigwroute.yaml`**: Example AIGatewayRoute that uses InferencePool as a backend.
1019
- **`httproute.yaml`**: Example HTTPRoute for traditional HTTP routing to InferencePool endpoints.
1120
- **`with-annotations.yaml`**: Advanced example showing InferencePool with Kubernetes annotations for fine-grained control.
@@ -27,16 +36,63 @@ This example demonstrates how to use AI Gateway with the InferencePool feature,
2736

2837
```bash
2938
kubectl apply -f base.yaml
39+
kubectl apply -f aigwroute.yaml
3040
```
3141

42+
> Note: The `aigwroute.yaml` file defines the InferencePool and routing logic, but does not deploy the actual inference backend (e.g., the vLLM server for Llama-3.1-8B-Instruct).
43+
> You must deploy the backend separately by following [Step 3: Deploy Inference Backends](https://aigateway.envoyproxy.io/docs/capabilities/inference/aigatewayroute-inferencepool#step-3-deploy-inference-backends)
44+
3245
3. Test the setup:
3346

34-
```bash
35-
GATEWAY_HOST=$(kubectl get gateway/ai-gateway -o jsonpath='{.status.addresses[0].value}')
36-
curl -X POST "http://${GATEWAY_HOST}/v1/chat/completions" \
37-
-H "Content-Type: application/json" \
38-
-d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}]}'
39-
```
47+
You can access the gateway in two ways, depending on your environment.
48+
49+
✅ Option A: Using External IP (e.g., cloud LoadBalancer, MetalLB)
50+
If your cluster assigns an external address to the Gateway:
51+
52+
```bash
53+
GATEWAY_HOST=$(kubectl get gateway/inference-pool-with-aigwroute -n default -o jsonpath='{.status.addresses[0].value}')
54+
echo "Gateway available at: http://${GATEWAY_HOST}"
55+
```
56+
57+
Then send a request:
58+
59+
```bash
60+
curl -X POST "http://${GATEWAY_HOST}/v1/chat/completions" \
61+
-H "x-ai-eg-model: meta-llama/Llama-3.1-8B-Instruct" \
62+
-H "Authorization: sk-abcdefghijklmnopqrstuvwxyz" \
63+
-H "Content-Type: application/json" \
64+
-d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'
65+
```
66+
67+
✅ Option B: Using kubectl port-forward (ideal for local clusters like Minikube/Kind)
68+
In one terminal, forward the gateway service:
69+
70+
```bash
71+
kubectl port-forward svc/envoy-default-inference-pool-with-aigwroute-d416582c 8080:80 -n envoy-gateway-system
72+
```
73+
74+
In another terminal, send requests to localhost:8080:
75+
76+
```bash
77+
# Route to Llama (InferencePool)
78+
curl -X POST "http://localhost:8080/v1/chat/completions" \
79+
-H "x-ai-eg-model: meta-llama/Llama-3.1-8B-Instruct" \
80+
-H "Authorization: sk-abcdefghijklmnopqrstuvwxyz" \
81+
-H "Content-Type: application/json" \
82+
-d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'
83+
84+
# Route to Mistral (InferencePool)
85+
curl -X POST "http://localhost:8080/v1/chat/completions" \
86+
-H "x-ai-eg-model: mistral:latest" \
87+
-H "Content-Type: application/json" \
88+
-d '{"model": "mistral:latest", "messages": [{"role": "user", "content": "Hello!"}]}'
89+
90+
# Route to fallback backend (Standard Backend)
91+
curl -X POST "http://localhost:8080/v1/chat/completions" \
92+
-H "x-ai-eg-model: some-cool-self-hosted-model" \
93+
-H "Content-Type: application/json" \
94+
-d '{"model": "some-cool-self-hosted-model", "messages": [{"role": "user", "content": "Hello!"}]}'
95+
```
4096

4197
### Combining with Other Features
4298

0 commit comments

Comments
 (0)