Skip to content

Commit ef93eea

Browse files
committed
Add BBR docs
1 parent 1457f63 commit ef93eea

File tree

3 files changed

+9
-0
lines changed

3 files changed

+9
-0
lines changed

site-src/guides/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,12 @@ Tooling:
131131
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
132132
```
133133

134+
### Deploy the Body Based Router Extension (Optional)
135+
136+
This part is optional and should be used if you wish to deploy the Body Based Router Extension which is an optional feature. This is needed if you wish to perform model-aware routing such as serving multiple different base models within the same L7 url path. If you are only serving 1 base model (with or without some LORA adapters) per gateway L7 path then you may skip this and move onto the next section.
137+
138+
<<To be added>>
139+
134140
### Deploy an Inference Gateway
135141

136142
Choose one of the following options to deploy an Inference Gateway.

site-src/guides/serve-multiple-genai-models.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ The company needs to ensure optimal serving performance for these LLMs.
66
By using an Inference Gateway, you can deploy these LLMs on your cluster with your chosen accelerator configuration in an `InferencePool`.
77
You can then route requests based on the model name (such as `chatbot` and `recommender`) and the `Criticality` property.
88

9+
<<To be added>>
10+
911
## How
1012

1113
The following diagram illustrates how an Inference Gateway routes requests to different models based on the model name.

site-src/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ The following specific terms to this project:
3131
performance, availability and capabilities to optimize routing. Includes
3232
things like [Prefix Cache](https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html) status or [LoRA Adapters](https://docs.vllm.ai/en/stable/features/lora.html) availability.
3333
- **Endpoint Picker(EPP)**: An implementation of an `Inference Scheduler` with additional Routing, Flow, and Request Control layers to allow for sophisticated routing strategies. Additional info on the architecture of the EPP [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).
34+
- **Body Based Router(BBR)**: An additional (optional) implementation of an `Inference Scheduler` that uses information from the body portion of the inference request (currently the model name attribute from the body) to enable model-aware functions such as routing/scheduling.
3435

3536
[Inference Gateway]:#concepts-and-definitions
3637

0 commit comments

Comments
 (0)