Skip to content

Commit a0822ff

Browse files
committed
[docs] Add vLLM health check docs
1 parent 7c9baf5 commit a0822ff

File tree

1 file changed

+26
-0
lines changed

1 file changed

+26
-0
lines changed

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,32 @@ parameters: {
311311
}
312312
```
313313

314+
## vLLM Health Check (BETA)
315+
316+
> [!NOTE]
317+
> The vLLM Health Check feature is currently in BETA. Its features and
318+
> functionality are subject to change as we collect feedback. We are excited to
319+
> hear any thoughts you have!
320+
321+
The vLLM backend supports checking for
322+
[vLLM Engine Health](https://github.com/vllm-project/vllm/blob/v0.6.3.post1/vllm/engine/async_llm_engine.py#L1177-L1185)
323+
when an inference request is received. If the health check fails, the entire
324+
model will be unloaded, so it becomes NOT Ready at the server.
325+
326+
The Health Check is disabled by default. To enable it, set the following
327+
parameter on the model config to true
328+
```
329+
parameters: {
330+
key: "ENABLE_VLLM_HEALTH_CHECK"
331+
value: { string_value: "true" }
332+
}
333+
```
334+
and select
335+
[Model Control Mode EXPLICIT](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_management.md#model-control-mode-explicit)
336+
when the server is started.
337+
338+
Supported since r24.12.
339+
314340
## Referencing the Tutorial
315341

316342
You can read further in the

0 commit comments

Comments
 (0)