You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
115
+
116
+
```bash
117
+
kubectl get httproute llm-route -o yaml
118
+
```
126
119
127
-
3. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
120
+
5. To install an InferencePool named vllm-llama3-8b-instruct that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
128
121
129
122
```bash
130
123
helm install vllm-llama3-8b-instruct \
@@ -136,6 +129,12 @@ A cluster with:
136
129
137
130
The Helm install automatically installs the endpoint-picker, inferencepool alongwith health check policy.
138
131
132
+
6. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -242,6 +241,16 @@ A cluster with:
242
241
kubectl get httproute llm-route -o yaml
243
242
```
244
243
244
+
245
+
### Deploy InferenceObjective (Optional)
246
+
247
+
Deploy the sample InferenceObjective which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
0 commit comments