Add BBR docs

srampal · srampal · commit ef93eea2aaf1 · 2025-09-10T21:14:48.000Z
diff --git a/site-src/guides/index.md b/site-src/guides/index.md
@@ -131,6 +131,12 @@ Tooling:
       oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
       ```
 
+### Deploy the Body Based Router Extension (Optional)
+
+    This part is optional and should be used if you wish to deploy the Body Based Router Extension which is an optional feature. This is needed if you wish to perform model-aware routing such as serving multiple different base models within the same L7 url path. If you are only serving 1 base model (with or without some LORA adapters) per gateway L7 path then you may skip this and move onto the next section.
+
+    <<To be added>>
+
 ### Deploy an Inference Gateway
 
    Choose one of the following options to deploy an Inference Gateway.
diff --git a/site-src/guides/serve-multiple-genai-models.md b/site-src/guides/serve-multiple-genai-models.md
@@ -6,6 +6,8 @@ The company needs to ensure optimal serving performance for these LLMs.
 By using an Inference Gateway, you can deploy these LLMs on your cluster with your chosen accelerator configuration in an `InferencePool`.
 You can then route requests based on the model name (such as `chatbot` and `recommender`) and the `Criticality` property.
 
+<<To be added>>
+
 ## How
 
 The following diagram illustrates how an Inference Gateway routes requests to different models based on the model name.
diff --git a/site-src/index.md b/site-src/index.md
@@ -31,6 +31,7 @@ The following specific terms to this project:
   performance, availability and capabilities to optimize routing. Includes
   things like [Prefix Cache](https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html) status or [LoRA Adapters](https://docs.vllm.ai/en/stable/features/lora.html) availability.
 - **Endpoint Picker(EPP)**: An implementation of an `Inference Scheduler` with additional Routing, Flow, and Request Control layers to allow for sophisticated routing strategies. Additional info on the architecture of the EPP [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).
+- **Body Based Router(BBR)**: An additional (optional) implementation of an `Inference Scheduler` that uses information from the body portion of the inference request (currently the model name attribute from the body) to enable model-aware functions such as routing/scheduling.
 
 [Inference Gateway]:#concepts-and-definitions