Skip to content

Commit 8e5ae69

Browse files
committed
Address review comments and restructure docs
1 parent ef54fa1 commit 8e5ae69

File tree

1 file changed

+28
-19
lines changed

1 file changed

+28
-19
lines changed

site-src/guides/index.md

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -83,24 +83,6 @@ A cluster with:
8383
kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd
8484
```
8585

86-
### Deploy InferenceModel
87-
88-
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
89-
90-
```bash
91-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencemodel.yaml
92-
```
93-
94-
### Deploy the InferencePool and Endpoint Picker Extension
95-
96-
```bash
97-
helm install vllm-llama3-8b-instruct \
98-
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
99-
--set provider.name=gke \
100-
--version v0.3.0 \
101-
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
102-
```
103-
10486
### Deploy an Inference Gateway
10587

10688
Choose one of the following options to deploy an Inference Gateway.
@@ -123,8 +105,19 @@ A cluster with:
123105
NAME CLASS ADDRESS PROGRAMMED AGE
124106
inference-gateway inference-gateway <MY_ADDRESS> True 22s
125107
```
108+
3. Deploy the HTTPRoute
109+
110+
```bash
111+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml
112+
```
113+
114+
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
115+
116+
```bash
117+
kubectl get httproute llm-route -o yaml
118+
```
126119

127-
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
120+
5. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
128121

129122
```bash
130123
helm install vllm-llama3-8b-instruct \
@@ -136,6 +129,12 @@ A cluster with:
136129
137130
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
138131

132+
6. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
133+
```
134+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
135+
```
136+
137+
139138
=== "Istio"
140139

141140
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -242,6 +241,16 @@ A cluster with:
242241
kubectl get httproute llm-route -o yaml
243242
```
244243

244+
245+
### Deploy InferenceObjective (Optional)
246+
247+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
248+
249+
```bash
250+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml
251+
```
252+
253+
245254
### Try it out
246255

247256
Wait until the gateway is ready.

0 commit comments

Comments
 (0)