diff --git a/mkdocs.yml b/mkdocs.yml index 8262f6fb0..c90b1bb49 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -13,6 +13,7 @@ theme: favicon: images/favicon-64.png features: - content.code.annotate + - content.code.copy - search.highlight - navigation.tabs - navigation.top diff --git a/site-src/guides/index.md b/site-src/guides/index.md index e7dd8611b..d192dd196 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -319,19 +319,14 @@ Tooling: kubectl get httproute llm-route -o yaml ``` -### Deploy the Body Based Router Extension (Optional) - - This guide shows how to get started with serving only 1 base model type per L7 URL path. If in addition, you wish to exercise model-aware routing such that more than 1 base model is served at the same L7 url path, that requires use of the (optional) Body Based Routing (BBR) extension which is described in a following section of the guide, namely the [`Serving Multiple GenAI Models`](serve-multiple-genai-models.md) section. - ### Deploy InferenceObjective (Optional) - Deploy the sample InferenceObjective which allows you to specify priority of requests. +Deploy the sample InferenceObjective which allows you to specify priority of requests. ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml ``` - ### Try it out Wait until the gateway is ready. @@ -348,6 +343,10 @@ Tooling: }' ``` +### Deploy the Body Based Router Extension (Optional) + +This guide has shown how to get started with serving a single base model type per L7 URL path. If after this exercise, you wish to continue on to exercise model-aware routing such that more than 1 base model is served at the same L7 url path, that requires use of the (optional) Body Based Routing (BBR) extension which is described in a separate section of the documentation, namely the [`Serving Multiple GenAI Models`](serve-multiple-genai-models.md) section. If you wish to exercise that function, then retain the setup you have deployed so far from this guide and move on to the additional steps described in [that guide](serve-multiple-genai-models.md) or else move on to the following section to cleanup your setup. + ### Cleanup The following instructions assume you would like to cleanup ALL resources that were created in this quickstart guide. diff --git a/site-src/guides/serve-multiple-genai-models.md b/site-src/guides/serve-multiple-genai-models.md index aede7a677..2064f999a 100644 --- a/site-src/guides/serve-multiple-genai-models.md +++ b/site-src/guides/serve-multiple-genai-models.md @@ -83,7 +83,7 @@ We also want to use an InferencePool and EndPoint Picker for this second model i oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool ``` -After executing this, very that you see two InferencePools and two EPP pods, one per base model type, running without errors, using the CLIs `kubectl get inferencepools` and `kubectl get pods`. +After executing this, verify that you see two InferencePools and two EPP pods, one per base model type, running without errors, using the CLIs `kubectl get inferencepools` and `kubectl get pods`. ### Configure HTTPRoute @@ -100,7 +100,7 @@ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extens ``` ```yaml ---- +--- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: @@ -121,11 +121,12 @@ spec: value: / headers: - type: Exact + #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header. name: X-Gateway-Model-Name # (1)! value: 'meta-llama/Llama-3.1-8B-Instruct' timeouts: request: 300s ---- +--- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: @@ -146,14 +147,15 @@ spec: value: / headers: - type: Exact - name: X-Gateway-Model-Name # (2)! + #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header. + name: X-Gateway-Model-Name value: 'microsoft/Phi-4-mini-instruct' timeouts: request: 300s ---- +--- ``` -Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True` for both routes: +Before testing the setup, confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True` for both routes using the following commands. ```bash kubectl get httproute llm-llama-route -o yaml @@ -163,8 +165,6 @@ kubectl get httproute llm-llama-route -o yaml kubectl get httproute llm-phi4-route -o yaml ``` -[BBR](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header with key `X-Gateway-Model-Name`. The header can then be used in the `HTTPRoute` to route requests to different `InferencePool` instances. - ## Try it out 1. Get the gateway IP: