You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
128
+
129
+
```bash
130
+
kubectl get httproute llm-route -o yaml
131
+
```
132
+
133
+
5. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
120
134
121
135
```bash
122
136
helm install vllm-llama3-8b-instruct \
@@ -128,6 +142,12 @@ A cluster with:
128
142
129
143
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
130
144
145
+
6. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -234,6 +254,7 @@ A cluster with:
234
254
kubectl get httproute llm-route -o yaml
235
255
```
236
256
257
+
<<<<<<< HEAD
237
258
=== "Agentgateway"
238
259
239
260
[Agentgateway](https://agentgateway.dev/) is a purpose-built proxy designed for AI workloads, and comes with native support for inference routing. Agentgateway integrates with [Kgateway](https://kgateway.dev/) as it's control plane.
@@ -280,6 +301,17 @@ A cluster with:
280
301
```bash
281
302
kubectl get httproute llm-route -o yaml
282
303
```
304
+
=======
305
+
306
+
### Deploy InferenceObjective (Optional)
307
+
308
+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
0 commit comments