Skip to content

Commit c4ec087

Browse files
committed
Address review comments and restructure docs
1 parent 84516bd commit c4ec087

File tree

1 file changed

+37
-26
lines changed

1 file changed

+37
-26
lines changed

site-src/guides/index.md

Lines changed: 37 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -75,24 +75,6 @@ A cluster with:
7575
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
7676
```
7777

78-
### Deploy InferenceModel
79-
80-
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
81-
82-
```bash
83-
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.5.1/config/manifests/inferencemodel.yaml
84-
```
85-
86-
### Deploy the InferencePool and Endpoint Picker Extension
87-
88-
```bash
89-
helm install vllm-llama3-8b-instruct \
90-
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
91-
--set provider.name=gke \
92-
--version v0.3.0 \
93-
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
94-
```
95-
9678
### Deploy an Inference Gateway
9779

9880
Choose one of the following options to deploy an Inference Gateway.
@@ -115,19 +97,23 @@ A cluster with:
11597
NAME CLASS ADDRESS PROGRAMMED AGE
11698
inference-gateway inference-gateway <MY_ADDRESS> True 22s
11799
```
100+
3. Deploy the HTTPRoute
101+
102+
```bash
103+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml
104+
```
118105

119-
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
106+
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
120107

121108
```bash
122-
helm install vllm-llama3-8b-instruct \
123-
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
124-
--set provider.name=gke \
125-
--version v0.3.0 \
126-
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
109+
kubectl get httproute llm-route -o yaml
127110
```
128-
129-
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
130111

112+
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
113+
```
114+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
115+
```
116+
131117
=== "Istio"
132118

133119
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -281,6 +267,31 @@ A cluster with:
281267
kubectl get httproute llm-route -o yaml
282268
```
283269

270+
271+
### Deploy the InferencePool and Endpoint Picker Extension
272+
273+
Install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
274+
275+
```bash
276+
helm install vllm-llama3-8b-instruct \
277+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
278+
--set provider.name=PROVIDER_NAME \
279+
--version v0.3.0 \
280+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
281+
```
282+
283+
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
284+
285+
### Deploy InferenceObjective (Optional)
286+
287+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
288+
289+
```bash
290+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml
291+
```
292+
293+
294+
284295
### Try it out
285296

286297
Wait until the gateway is ready.

0 commit comments

Comments
 (0)