You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/inference-pool/README.md
+63-7Lines changed: 63 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,10 +2,19 @@
2
2
3
3
This example demonstrates how to use AI Gateway with the InferencePool feature, which enables intelligent request routing across multiple inference endpoints with load balancing and health checking capabilities.
4
4
5
+
The setup includes **three distinct backends**:
6
+
7
+
- Two `InferencePool` resources for LLMs (`Llama-3.1-8B-Instruct` and `Mistral`)
8
+
- One standard `Backend` for non-InferencePool traffic
9
+
10
+
Routing is controlled by the `x-ai-eg-model` HTTP header.
11
+
5
12
## Files in This Directory
6
13
7
14
-**`envoy-gateway-values-addon.yaml`**: Envoy Gateway values addon for InferencePool support. Combine with `../../manifests/envoy-gateway-values.yaml`.
8
-
-**`base.yaml`**: Complete example that includes Gateway, AIServiceBackend, InferencePool CRDs, and a sample application deployment.
15
+
-**`base.yaml`**: Deploys all inference backends and supporting resources using the **standard approach documented in the official guide**. This includes:
16
+
- A `mistral` backend with custom Endpoint Picker configuration
17
+
- A standard fallback backend (`envoy-ai-gateway-basic-testupstream`) for non-InferencePool routing
9
18
-**`aigwroute.yaml`**: Example AIGatewayRoute that uses InferencePool as a backend.
10
19
-**`httproute.yaml`**: Example HTTPRoute for traditional HTTP routing to InferencePool endpoints.
11
20
-**`with-annotations.yaml`**: Advanced example showing InferencePool with Kubernetes annotations for fine-grained control.
@@ -27,16 +36,63 @@ This example demonstrates how to use AI Gateway with the InferencePool feature,
27
36
28
37
```bash
29
38
kubectl apply -f base.yaml
39
+
kubectl apply -f aigwroute.yaml
30
40
```
31
41
42
+
> Note: The `aigwroute.yaml` file defines the InferencePool and routing logic, but does not deploy the actual inference backend (e.g., the vLLM server for Llama-3.1-8B-Instruct).
43
+
> You must deploy the backend separately by following [Step 3: Deploy Inference Backends](https://aigateway.envoyproxy.io/docs/capabilities/inference/aigatewayroute-inferencepool#step-3-deploy-inference-backends)
44
+
32
45
3. Test the setup:
33
46
34
-
```bash
35
-
GATEWAY_HOST=$(kubectl get gateway/ai-gateway -o jsonpath='{.status.addresses[0].value}')
36
-
curl -X POST "http://${GATEWAY_HOST}/v1/chat/completions" \
0 commit comments