Skip to content

Commit a4e9d1a

Browse files
committed
Address review comments and restructure docs
1 parent 84516bd commit a4e9d1a

File tree

1 file changed

+39
-10
lines changed

1 file changed

+39
-10
lines changed

site-src/guides/index.md

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,9 @@ A cluster with:
7575
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
7676
```
7777

78-
### Deploy InferenceModel
78+
### Deploy InferenceObjective
7979

80-
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
80+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
8181

8282
```bash
8383
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.5.1/config/manifests/inferencemodel.yaml
@@ -115,19 +115,23 @@ A cluster with:
115115
NAME CLASS ADDRESS PROGRAMMED AGE
116116
inference-gateway inference-gateway <MY_ADDRESS> True 22s
117117
```
118+
3. Deploy the HTTPRoute
118119

119-
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
120+
```bash
121+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml
122+
```
123+
124+
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
120125

121126
```bash
122-
helm install vllm-llama3-8b-instruct \
123-
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
124-
--set provider.name=gke \
125-
--version v0.3.0 \
126-
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
127+
kubectl get httproute llm-route -o yaml
127128
```
128-
129-
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
130129

130+
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
131+
```
132+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
133+
```
134+
131135
=== "Istio"
132136

133137
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -281,6 +285,31 @@ A cluster with:
281285
kubectl get httproute llm-route -o yaml
282286
```
283287

288+
289+
### Deploy the InferencePool and Endpoint Picker Extension
290+
291+
Install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
292+
293+
```bash
294+
helm install vllm-llama3-8b-instruct \
295+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
296+
--set provider.name=PROVIDER_NAME \
297+
--version v0.3.0 \
298+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
299+
```
300+
301+
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
302+
303+
### Deploy InferenceObjective (Optional)
304+
305+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
306+
307+
```bash
308+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml
309+
```
310+
311+
312+
284313
### Try it out
285314

286315
Wait until the gateway is ready.

0 commit comments

Comments
 (0)