Skip to content

Commit 7188485

Browse files
committed
[docs] Enhance vLLM health check docs
1 parent eb838cd commit 7188485

File tree

2 files changed

+62
-24
lines changed

2 files changed

+62
-24
lines changed

README.md

Lines changed: 4 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -316,31 +316,11 @@ parameters: {
316316
}
317317
```
318318

319-
## vLLM Health Check (BETA)
319+
## vLLM Engine Health Check (BETA)
320320

321-
> [!NOTE]
322-
> The vLLM Health Check feature is currently in BETA. Its features and
323-
> functionality are subject to change as we collect feedback. We are excited to
324-
> hear any thoughts you have!
325-
326-
The vLLM backend supports checking for
327-
[vLLM Engine Health](https://github.com/vllm-project/vllm/blob/v0.6.3.post1/vllm/engine/async_llm_engine.py#L1177-L1185)
328-
when an inference request is received. If the health check fails, the entire
329-
model will be unloaded, so it becomes NOT Ready at the server.
330-
331-
The Health Check is disabled by default. To enable it, set the following
332-
parameter on the model config to true
333-
```
334-
parameters: {
335-
key: "ENABLE_VLLM_HEALTH_CHECK"
336-
value: { string_value: "true" }
337-
}
338-
```
339-
and select
340-
[Model Control Mode EXPLICIT](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_management.md#model-control-mode-explicit)
341-
when the server is started.
342-
343-
Supported since r24.12.
321+
vLLM Engine Health Check may be enabled optionally, for more accurate model
322+
state reported by the server. See [this docs](docs/health_check.md) for more
323+
information.
344324

345325
## Referencing the Tutorial
346326

docs/health_check.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
<!--
2+
# Copyright 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# vLLM Health Check (BETA)
30+
31+
> [!NOTE]
32+
> The vLLM Health Check support is currently in BETA. Its features and
33+
> functionality are subject to change as we collect feedback. We are excited to
34+
> hear any thoughts you have!
35+
36+
The vLLM backend supports checking for
37+
[vLLM Engine Health](https://github.com/vllm-project/vllm/blob/v0.6.3.post1/vllm/engine/async_llm_engine.py#L1177-L1185)
38+
upon receiving each inference request. If the health check fails, the entire
39+
model will be unloaded, so its state becomes NOT Ready at the server, which can
40+
be queried by the
41+
[Repository Index](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_repository.md#index)
42+
or
43+
[Model Ready](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/library/http_client.h#L178-L192)
44+
APIs.
45+
46+
The Health Check is disabled by default. To enable it, set the following
47+
parameter on the model config to true
48+
```
49+
parameters: {
50+
key: "ENABLE_VLLM_HEALTH_CHECK"
51+
value: { string_value: "true" }
52+
}
53+
```
54+
and select
55+
[Model Control Mode EXPLICIT](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_management.md#model-control-mode-explicit)
56+
when the server is started.
57+
58+
Supported since r24.12.

0 commit comments

Comments
 (0)