Skip to content

Commit 33cda4b

Browse files
authored
Changes to multi-model documentation (#941)
- clarify the use of BBR upfront, and disptaching to different InferencePool/EPP - fix typo in example inference - both requests were sent to same model Signed-off-by: Etai Lev Ran <[email protected]>
1 parent 4c319af commit 33cda4b

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

site-src/guides/serve-multiple-genai-models.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ You can then route requests based on the model name (such as "chatbot" and "reco
77

88
## How
99
The following diagram illustrates how Gateway API Inference Extension routes requests to different models based on the model name.
10+
The model name is extarcted by [Body-Based routing](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md)
11+
from the request body to the header. The header is then matched to dispatch
12+
requests to different `InferencePool` (and their EPPs) instances.
1013
![Serving multiple generative AI models](../images/serve-mul-gen-AI-models.png)
1114

1215
This example illustrates a conceptual example regarding how to use the `HTTPRoute` object to route based on model name like “chatbot” or “recommender” to `InferencePool`.
@@ -63,9 +66,9 @@ curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
6366
3. Send a few requests to model "recommender" as follows:
6467
```bash
6568
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
66-
"model": "chatbot",
69+
"model": "recommender",
6770
"prompt": "Give me restaurant recommendations in Paris",
6871
"max_tokens": 100,
6972
"temperature": 0
7073
}'
71-
```
74+
```

0 commit comments

Comments
 (0)