Skip to content

Commit 01ecee3

Browse files
committed
Address review comments and restructure docs
1 parent 84516bd commit 01ecee3

File tree

1 file changed

+37
-16
lines changed

1 file changed

+37
-16
lines changed

site-src/guides/index.md

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -75,14 +75,6 @@ A cluster with:
7575
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
7676
```
7777

78-
### Deploy InferenceModel
79-
80-
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
81-
82-
```bash
83-
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.5.1/config/manifests/inferencemodel.yaml
84-
```
85-
8678
### Deploy the InferencePool and Endpoint Picker Extension
8779

8880
```bash
@@ -115,19 +107,23 @@ A cluster with:
115107
NAME CLASS ADDRESS PROGRAMMED AGE
116108
inference-gateway inference-gateway <MY_ADDRESS> True 22s
117109
```
110+
3. Deploy the HTTPRoute
111+
112+
```bash
113+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml
114+
```
118115

119-
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
116+
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
120117

121118
```bash
122-
helm install vllm-llama3-8b-instruct \
123-
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
124-
--set provider.name=gke \
125-
--version v0.3.0 \
126-
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
119+
kubectl get httproute llm-route -o yaml
127120
```
128-
129-
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
130121

122+
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
123+
```
124+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
125+
```
126+
131127
=== "Istio"
132128

133129
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -281,6 +277,31 @@ A cluster with:
281277
kubectl get httproute llm-route -o yaml
282278
```
283279

280+
281+
### Deploy the InferencePool and Endpoint Picker Extension
282+
283+
Install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
284+
285+
```bash
286+
helm install vllm-llama3-8b-instruct \
287+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
288+
--set provider.name=PROVIDER_NAME \
289+
--version v0.3.0 \
290+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
291+
```
292+
293+
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
294+
295+
### Deploy InferenceObjective (Optional)
296+
297+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
298+
299+
```bash
300+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml
301+
```
302+
303+
304+
284305
### Try it out
285306

286307
Wait until the gateway is ready.

0 commit comments

Comments
 (0)