You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- clarify the use of BBR upfront, and disptaching to different InferencePool/EPP
- fix typo in example inference - both requests were sent to same model
Signed-off-by: Etai Lev Ran <[email protected]>
Copy file name to clipboardExpand all lines: site-src/guides/serve-multiple-genai-models.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,9 @@ You can then route requests based on the model name (such as "chatbot" and "reco
7
7
8
8
## How
9
9
The following diagram illustrates how Gateway API Inference Extension routes requests to different models based on the model name.
10
+
The model name is extarcted by [Body-Based routing](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md)
11
+
from the request body to the header. The header is then matched to dispatch
12
+
requests to different `InferencePool` (and their EPPs) instances.
10
13

11
14
12
15
This example illustrates a conceptual example regarding how to use the `HTTPRoute` object to route based on model name like “chatbot” or “recommender” to `InferencePool`.
0 commit comments