You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
80
+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
130
129
130
+
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -281,6 +285,31 @@ A cluster with:
281
285
kubectl get httproute llm-route -o yaml
282
286
```
283
287
288
+
289
+
### Deploy the InferencePool and Endpoint Picker Extension
290
+
291
+
Install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
302
+
303
+
### Deploy InferenceObjective (Optional)
304
+
305
+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
0 commit comments