You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site-src/guides/index.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -318,19 +318,14 @@ Tooling:
318
318
kubectl get httproute llm-route -o yaml
319
319
```
320
320
321
-
### Deploy the Body Based Router Extension (Optional)
322
-
323
-
This guide shows how to get started with serving only 1 base model type per L7 URL path. If in addition, you wish to exercise model-aware routing such that more than 1 base model is served at the same L7 url path, that requires use of the (optional) Body Based Routing (BBR) extension which is described in a following section of the guide, namely the [`Serving Multiple GenAI Models`](serve-multiple-genai-models.md) section.
324
-
325
321
### Deploy InferenceObjective (Optional)
326
322
327
-
Deploy the sample InferenceObjective which allows you to specify priority of requests.
323
+
Deploy the sample InferenceObjective which allows you to specify priority of requests.
### Deploy the Body Based Router Extension (Optional)
346
+
347
+
This guide has shown how to get started with serving a single base model type per L7 URL path. If after this exercise, you wish to continue on to exercise model-aware routing such that more than 1 base model is served at the same L7 url path, that requires use of the (optional) Body Based Routing (BBR) extension which is described in a separate section of the documentation, namely the [`Serving Multiple GenAI Models`](serve-multiple-genai-models.md) section. If you wish to exercise that function, then retain the setup you have deployed so far from this guide and move on to the additional steps described in [that guide](serve-multiple-genai-models.md) or else move on to the following section to cleanup your setup.
348
+
350
349
### Cleanup
351
350
352
351
The following instructions assume you would like to cleanup ALL resources that were created in this quickstart guide.
After executing this, very that you see two InferencePools and two EPP pods, one per base model type, running without errors, using the CLIs `kubectl get inferencepools` and `kubectl get pods`.
86
+
After executing this, verify that you see two InferencePools and two EPP pods, one per base model type, running without errors, using the CLIs `kubectl get inferencepools` and `kubectl get pods`.
#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.
124
125
name: X-Gateway-Model-Name # (1)!
125
126
value: 'meta-llama/Llama-3.1-8B-Instruct'
126
127
timeouts:
127
128
request: 300s
128
-
---
129
+
---
129
130
apiVersion: gateway.networking.k8s.io/v1
130
131
kind: HTTPRoute
131
132
metadata:
@@ -146,14 +147,15 @@ spec:
146
147
value: /
147
148
headers:
148
149
- type: Exact
149
-
name: X-Gateway-Model-Name # (2)!
150
+
#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.
151
+
name: X-Gateway-Model-Name
150
152
value: 'microsoft/Phi-4-mini-instruct'
151
153
timeouts:
152
154
request: 300s
153
-
---
155
+
---
154
156
```
155
157
156
-
Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True` for both routes:
158
+
Before testing the setup, confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True` for both routes using the following commands.
157
159
158
160
```bash
159
161
kubectl get httproute llm-llama-route -o yaml
@@ -163,8 +165,6 @@ kubectl get httproute llm-llama-route -o yaml
163
165
kubectl get httproute llm-phi4-route -o yaml
164
166
```
165
167
166
-
[BBR](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header with key `X-Gateway-Model-Name`. The header can then be used in the `HTTPRoute` to route requests to different `InferencePool` instances.
0 commit comments