Skip to content

Commit ef54fa1

Browse files
committed
Update public docs to use helm charts
1 parent 2c88636 commit ef54fa1

File tree

1 file changed

+16
-18
lines changed

1 file changed

+16
-18
lines changed

site-src/guides/index.md

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
88

99
## **Prerequisites**
1010

11-
- A cluster with:
11+
A cluster with:
1212
- Support for services of type `LoadBalancer`. For kind clusters, follow [this guide](https://kind.sigs.k8s.io/docs/user/loadbalancer)
1313
to get services of type LoadBalancer working.
1414
- Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29)
@@ -94,7 +94,11 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
9494
### Deploy the InferencePool and Endpoint Picker Extension
9595

9696
```bash
97-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool-resources.yaml
97+
helm install vllm-llama3-8b-instruct \
98+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
99+
--set provider.name=gke \
100+
--version v0.3.0 \
101+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
98102
```
99103

100104
### Deploy an Inference Gateway
@@ -106,37 +110,31 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
106110
1. Enable the Gateway API and configure proxy-only subnets when necessary. See [Deploy Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways)
107111
for detailed instructions.
108112

109-
1. Deploy Gateway and HealthCheckPolicy resources
113+
2. Deploy Gateway:
110114

111115
```bash
112116
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gateway.yaml
113-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/healthcheck.yaml
114117
```
115118

116119
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
120+
117121
```bash
118122
$ kubectl get gateway inference-gateway
119123
NAME CLASS ADDRESS PROGRAMMED AGE
120124
inference-gateway inference-gateway <MY_ADDRESS> True 22s
121125
```
122126

123-
3. Deploy the HTTPRoute
124-
125-
```bash
126-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml
127-
```
128-
129-
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
130-
131-
```bash
132-
kubectl get httproute llm-route -o yaml
133-
```
134-
135-
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
127+
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
136128

137129
```bash
138-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
130+
helm install vllm-llama3-8b-instruct \
131+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
132+
--set provider.name=gke \
133+
--version v0.3.0 \
134+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
139135
```
136+
137+
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
140138

141139
=== "Istio"
142140

0 commit comments

Comments
 (0)